From bugzilla-daemon at portal.open-bio.org Tue Jul 1 04:36:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 04:36:33 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807010836.m618aXO8014712@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 fkauff at biologie.uni-kl.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #13 from fkauff at biologie.uni-kl.de 2008-07-01 04:36 EST ------- Just uploaded a new Nexus.py to CVS. First, the taxlabels command in a taxa block is now ignored. For a standard nexus file, taxon labels are in the matrix, and a taxon block is irrelevant. The only exception are transposed matrices, which are not supported by Nexus.py anyway. Without the added confusion of a separate taxlabels command, it is now fairly easy to deal with duplicate names. Both self.taxlabels and self.matrix now carry the same set of unique taxon names. All example files seem to work fine for me. unless I hear otherwise, I close this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 05:01:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 05:01:29 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807010901.m6191TxO015999@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 05:01 EST ------- Does this mean that there will be no way to see the original non-unique names from within Bio.Nexus? I agree they are a pain, but it would be nice to preserve them. I haven't read the Nexus specs (restricted article), but does this comment on the issue of repeated identifiers? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 05:13:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 05:13:02 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807010913.m619D2vK016454@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #15 from fkauff at biologie.uni-kl.de 2008-07-01 05:13 EST ------- Yes, the original non-unique names are currently not preserved. It would be fairly easy to keep them, if desired. The NEXUS specs (Maddison et al.) state that unique names "should be avoided if this might cause ambiguity", which imho they always do. But I experienced that sometimes names become identical due to truncation etc, so I needed a way to deal with it instead of just throwing an error. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 09:16:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 09:16:57 -0400 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200807011316.m61DGvGS029051@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-01 09:16 EST ------- I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for everyone (instead creating their own uppercase-lowercase variants of those terribly complicated biopython alphabet classes), and easy to change for all other modules if lowercase-uppercase is what they want (or need). Nexus.py and Phd.py certainly need to allow lowercase characters, as this is very common. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 11:56:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 11:56:03 -0400 Subject: [Biopython-dev] [Bug 2533] New: Support for simple "tab" format in Bio.SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2533 Summary: Support for simple "tab" format in Bio.SeqIO Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Requested on the mailing list by Giovanni Marco Dall'Olio: http://lists.open-bio.org/pipermail/biopython/2008-June/004312.html See BioPerl: http://www.bioperl.org/wiki/Tab_sequence_format Suggested implementation to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 11:57:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 11:57:26 -0400 Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in Bio.SeqIO In-Reply-To: Message-ID: <200807011557.m61FvQN5006042@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2533 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 11:57 EST ------- Created an attachment (id=962) --> (http://bugzilla.open-bio.org/attachment.cgi?id=962&action=view) New file Bio/SeqIO/TabIO.py Treats the first field as the record's .id (and .name) Treats the second field as the record's sequence. When writing, uses only record.id and record.seq -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 12:00:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 12:00:59 -0400 Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in Bio.SeqIO In-Reply-To: Message-ID: <200807011600.m61G0xUp006217@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2533 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 12:00 EST ------- Created an attachment (id=963) --> (http://bugzilla.open-bio.org/attachment.cgi?id=963&action=view) Patch to add the "tab" format to Bio.SeqIO and update the unit test output The plumbing to make Bio.SeqIO (and Bio.AlignIO) aware of the new format. Adds the reader/writer mapping to Bio/SeqIO/__init__.py (trivial) and gives the updated output from test_SeqIO.py (trivial to regenerate with "python run_tests.py -g test_SeqIO.py"). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 2 06:33:35 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 11:33:35 +0100 Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez Message-ID: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> Hello Michiel et al., I've already added a few if statements to the end of Bio.Entrez._open() to catch a few errors I'd observed, and I've just found another example: >>> from Bio import Entrez >>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read() '\n' >>> Entrez.efetch("nucleotide", id="fiction").read() '\n' This seems to happen for any invalid identifier. Are you happy for me to check for this as an error too? Are there any valid reasons to get back an empty dataset like this? Also, I was wondering if we should raise a ValueError rather than IOError if we are fairly sure the problem is with the arguments rather than the network or the sever being unavailable. Peter From sdavis2 at mail.nih.gov Wed Jul 2 07:18:43 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 Jul 2008 07:18:43 -0400 Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez In-Reply-To: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> Message-ID: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com> On Wed, Jul 2, 2008 at 6:33 AM, Peter wrote: > Hello Michiel et al., > > I've already added a few if statements to the end of > Bio.Entrez._open() to catch a few errors I'd observed, and I've just > found another example: > >>>> from Bio import Entrez >>>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read() > '\n' >>>> Entrez.efetch("nucleotide", id="fiction").read() > '\n' > > This seems to happen for any invalid identifier. Are you happy for me > to check for this as an error too? Are there any valid reasons to get > back an empty dataset like this? If the ability to use history is added, then an empty dataset could be a valid return after an empty search. For id-based-searches, I'm not sure I would raise an error for an empty set being returned anyway. Just my $0.02. Sean > Also, I was wondering if we should raise a ValueError rather than > IOError if we are fairly sure the problem is with the arguments rather > than the network or the sever being unavailable. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Wed Jul 2 07:34:32 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 12:34:32 +0100 Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez In-Reply-To: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com> References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com> Message-ID: <320fb6e00807020434p474cect7a7b0d51148d7760@mail.gmail.com> >> This seems to happen for any invalid identifier. Are you happy for me >> to check for this as an error too? Are there any valid reasons to get >> back an empty dataset like this? > > If the ability to use history is added, then an empty dataset could be > a valid return after an empty search. ... Bio.Entrez has always supported the history, its just up to the user to take advantage of it. I've included an example in the tutorial to explain how to do this, cut and pasted below: from Bio import Entrez search_handle = Entrez.esearch(db="nucleotide",term="Opuntia and rpl16", usehistory="y", email="history.user at example.com") search_results = Entrez.read(search_handle) search_handle.close() gi_list = search_results["IdList"] count = int(search_results["Count"]) assert count == len(gi_list) session_cookie = search_results["WebEnv"] query_key = search_results["QueryKey"] #Now use the history session cookie and query key to download the results in batchs batch_size = 3 out_handle = open("orchid_rpl16.fasta", "w") for start in range(0,count,batch_size) : end = min(count, start+batch_size) print "Going to download record %i to %i" % (start+1, end) fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retstart=start, retmax=batch_size, webenv=session_cookie, query_key=query_key, email="history.user at example.com") data = fetch_handle.read() fetch_handle.close() out_handle.write(data) out_handle.close() Feedback on the tutorial or the example is of course welcome. > For id-based-searches, I'm not sure I would raise an error for an empty > set being returned anyway. > > Just my $0.02. I was wondering about this kind of thing... maybe some more testing of these kinds of examples would be in order. Peter From biopython at maubp.freeserve.co.uk Wed Jul 2 09:03:36 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 14:03:36 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO Message-ID: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com> Hi all, Do any of you have any comments or feedback on this suggested new "simple tab separated" format for Bio.SeqIO? To match BioPerl I plan on calling it the "tab" format - see below. Any real world example files would be good for the test suite. One nice thing is it adds another output format, something we're a bit short of in Bio.SeqIO with only fasta and some alignment formats (now handled via Bio.AlignIO, i.e. pfam/stockholm, clustal and phylip). Peter ---------- Forwarded message ---------- From: Peter Date: Tue, Jul 1, 2008 at 5:06 PM Subject: Re: [BioPython] Sequence from Fasta To: dalloliogm at gmail.com Cc: biopython at biopython.org Giovanni wrote: > yes, I think it will be useful to implement. > I know of people who have written a customized fasta2tab script and > use it quite frequently, so it would be good to support such a task. > As you said before this format is commonly used in combination with > grep/gawk scripts. I've gone for the simple option about how to parse the first field, its used as the record identifer (.id) and name only (nothing clever). Here is my suggested code, which you are welcome to download and try out. Bug 2533 - Support for simple "tab" format in Bio.SeqIO http://bugzilla.open-bio.org/show_bug.cgi?id=2533 If you want to try this yourself you'll need to download the new file TabIO.py into the Bio/SeqIO folder and update Bio/SeqIO/__init__.py to tell it about the new format (two new lines, see patch). Peter From biopython at maubp.freeserve.co.uk Wed Jul 2 09:21:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 14:21:29 +0100 Subject: [Biopython-dev] Questions about the NEXUS format Message-ID: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> Hello again Frank, As Biopython's NEXUS expect, I've got a couple of hopefully trivial questions about the format, which connect to how best to handle it the Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO http://biopython.org/wiki/AlignIO My short questions are: Q1: Can a file contain more than one NEXUS record (i.e. concatenation, with more than one #NEXUS line)? Q2: Can a NEXUS record/file contain more than one alignment (matrix block)? If the answer to either of those is a "yes", then any example files you could contribute would be very helpful. I have a more complicated question too, which would help me to resolve Bug 2227: http://bugzilla.open-bio.org/show_bug.cgi?id=2227 Q3: Given a generic Alignment object (e.g. from one of the other parsers), can I construct a corresponding Nexus object where the aligned sequences are used for the matrix? If so, how? Thank you, Peter From mjldehoon at yahoo.com Wed Jul 2 09:30:06 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Jul 2008 06:30:06 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics Message-ID: <29487.55988.qm@web62410.mail.re1.yahoo.com> Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format. In this format, each sequence has a name and comments, and in addition there can also be an overall comment to the file. Currently the parser in Bio.IntelliGenetics stores this information in Bio.IntelliGenetics.Record.Record objects (one record per sequence; the overall comment is inadvertently added to the first sequence in the file). I think it makes more sense to use a SeqRecord for that, and to deprecate Bio.IntelliGenetics.Record.Record. In that case, Bio.SeqIO looks like a more suitable place for this parser. The user would see something like this: >>> from Bio import SeqIO >>> handle = open("mydatafile.txt") >>> records = SeqIO.parse(handle, "ig") >>> records.comment "This is the overall comment" >>> for record in records: # ... record is a SeqRecord. Because of the overall comment, SeqIO.parse cannot simply return a generator function. It must return a full-fledged class, but one with an iterator. Any objections, anybody? --Michiel From biopython at maubp.freeserve.co.uk Wed Jul 2 09:48:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 14:48:31 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <29487.55988.qm@web62410.mail.re1.yahoo.com> References: <29487.55988.qm@web62410.mail.re1.yahoo.com> Message-ID: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> On Wed, Jul 2, 2008 at 2:30 PM, Michiel de Hoon wrote: > Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format. Just to be upfront, I'm not familiar with this format, but I've had a look at the examples. > In this format, each sequence has a name and comments, and in addition there can > also be an overall comment to the file. OK. This is also the case in other file formats, for example GenBank files can have free format text file header at the start but we ignore this. How would you separate the file header comment from the first record comment? Some files include what looks like a file header but the lines all seem to start with "; ". Maybe look for "; LOCUS..."? Given the whole comment seems to be free format I don't think this is very nice. On the other hand, some of the sample inputs includes a number of lines starting ";; Modified by ..." which would be easy to separate (one semi colon versus two semi colons). These are clearly file-level header lines, rather than being part of the first record. > Currently the parser in Bio.IntelliGenetics stores this information in > Bio.IntelliGenetics.Record.Record objects (one record per sequence; the > overall comment is inadvertently added to the first sequence in the file). I > think it makes more sense to use a SeqRecord for that, and to deprecate > Bio.IntelliGenetics.Record.Record. If all the data extracted by the Bio.IntelliGenetics parser could be dealt with using the SeqRecord parser added to Bio.SeqIO, then yes deprecating Bio.IntelliGenetics sounds fine. > In that case, Bio.SeqIO looks like a more suitable place for this parser. > The user would see something like this: >>>> from Bio import SeqIO >>>> handle = open("mydatafile.txt") >>>> records = SeqIO.parse(handle, "ig") >>>> records.comment > "This is the overall comment" >>>> for record in records: > # ... record is a SeqRecord. I see you are using "ig" as the format name, matching EMBOSS. Good :) http://emboss.sourceforge.net/docs/themes/seqformats/ig > Because of the overall comment, SeqIO.parse cannot simply return a > generator function. It must return a full-fledged class, but one with an iterator. Not necessarily. We can still use a simple generator function and either throw away the header comment, or included it with the first record (or even with every record). If you did create an iterator class, would you make the header available as a property of the iterator? Given the apparently fuzzy boundary between the file header and the first record header, I would just opt to treat it all as a comment for the first record. And use a simple generator function. Peter From fkauff at biologie.uni-kl.de Wed Jul 2 10:01:01 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Wed, 02 Jul 2008 16:01:01 +0200 Subject: [Biopython-dev] Questions about the NEXUS format In-Reply-To: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> Message-ID: <486B8A1D.8090806@biologie.uni-kl.de> Hi Peter, Peter wrote: > Hello again Frank, > > As Biopython's NEXUS expect, I've got a couple of hopefully trivial > questions about the format, which connect to how best to handle it the > Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO > http://biopython.org/wiki/AlignIO > > My short questions are: > > Q1: Can a file contain more than one NEXUS record (i.e. concatenation, > with more than one #NEXUS line)? > As far as I know: no. #NEXUS just indicates the file being a NEXUS file, the concept of "records" is not part of a nexus file > Q2: Can a NEXUS record/file contain more than one alignment (matrix block)? > > I just had a quick look at the old Maddison et al. introductory paper of Nexus, and it says that "although the nexus standard does not impose constraints on the number of blocks, particular programs will". I don't know of any program that would read more than one data block and keep both of them. > If the answer to either of those is a "yes", then any example files > you could contribute would be very helpful. > > I have a more complicated question too, which would help me to resolve Bug 2227: > http://bugzilla.open-bio.org/show_bug.cgi?id=2227 > > Q3: Given a generic Alignment object (e.g. from one of the other > parsers), can I construct a corresponding Nexus object where the > aligned sequences are used for the matrix? If so, how? > Hmmm - not really. Nexus.py does not support "empty" nexus class objects that could be filled with data (just tried) . But it would actually be a nice thing to have. I'll put this on my to do list. Cheers, Frank > Thank you, > > Peter > > ' From biopython at maubp.freeserve.co.uk Wed Jul 2 10:01:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:01:13 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> References: <29487.55988.qm@web62410.mail.re1.yahoo.com> <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> Message-ID: <320fb6e00807020701k2f5bf546j2d5ef3514a24e31a@mail.gmail.com> Hello again, Interestingly the IntelliGenetics looks the same as the MASE alignment file format: http://www.bioperl.org/wiki/Mase_multiple_alignment_format On the other hand, the EMBOSS example is clearly not a multiple sequence alignment: http://emboss.sourceforge.net/docs/themes/seqformats/ig Adding the parser to Bio.SeqIO would let us read in alignments too via Bio.AlignIO (which will offload the parsing to Bio.SeqIO and then try and convert the SeqRecords into an Alignment). Peter From biopython at maubp.freeserve.co.uk Wed Jul 2 10:06:40 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:06:40 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com> References: <29487.55988.qm@web62410.mail.re1.yahoo.com> <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com> Message-ID: <320fb6e00807020706l28309346m57e7bd884a0b7b9b@mail.gmail.com> Forgot to send this to the list, another point about IntelliGenetics vs MASE ---------- Forwarded message ---------- From: Peter Date: Wed, Jul 2, 2008 at 3:05 PM Subject: Re: [Biopython-dev] Bio.IntelliGenetics To: mjldehoon at yahoo.com > How would you separate the file header comment from the first record > comment? Some files include what looks like a file header but the > lines all seem to start with "; ". Maybe look for "; LOCUS..."? > Given the whole comment seems to be free format I don't think this is > very nice. > > On the other hand, some of the sample inputs includes a number of > lines starting ";; Modified by ..." which would be easy to separate > (one semi colon versus two semi colons). These are clearly file-level > header lines, rather than being part of the first record. I found an old link I had added on the wiki page for SeqIO development, http://pbil.univ-lyon1.fr/help/formats.html This clearly describes MASE format format s having (optional) header lines as starting with two semi colons. But are MASE and IntelliGenetics the same thing? Petet From biopython at maubp.freeserve.co.uk Wed Jul 2 10:12:48 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:12:48 +0100 Subject: [Biopython-dev] Questions about the NEXUS format In-Reply-To: <486B8A1D.8090806@biologie.uni-kl.de> References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> <486B8A1D.8090806@biologie.uni-kl.de> Message-ID: <320fb6e00807020712y54874e02k6854b92e1711358d@mail.gmail.com> >> My short questions are: >> >> Q1: Can a file contain more than one NEXUS record (i.e. concatenation, >> with more than one #NEXUS line)? > > As far as I know: no. #NEXUS just indicates the file being a NEXUS file, the > concept of "records" is not part of a nexus file OK, thank you. >> Q2: Can a NEXUS record/file contain more than one alignment (matrix >> block)? > > I just had a quick look at the old Maddison et al. introductory paper of > Nexus, and it says that "although the nexus standard does not impose > constraints on the number of blocks, particular programs will". I don't know > of any program that would read more than one data block and keep both of > them. So that is a "yes in theory", but it doesn't sound worth worrying about. >> Q3: Given a generic Alignment object (e.g. from one of the other >> parsers), can I construct a corresponding Nexus object where the >> aligned sequences are used for the matrix? If so, how? > > Hmmm - not really. Nexus.py does not support "empty" nexus class objects > that could be filled with data (just tried) . But it would actually be a > nice thing to have. I'll put this on my to do list. Thanks, Peter From mjldehoon at yahoo.com Wed Jul 2 10:15:16 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Jul 2008 07:15:16 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> Message-ID: <529945.38158.qm@web62404.mail.re1.yahoo.com> > On the other hand, some of the sample inputs includes a number of > lines starting ";; Modified by ..." which would be easy to separate > (one semi colon versus two semi colons). These are clearly file-level > header lines, rather than being part of the first record. According to the website mentioned in Bio/IntelliGenetics/__init__.py, the file-level comments have two semi colons, as opposed to the sequence-specific comments, which have one semi colon. http://pbil.univ-lyon1.fr/help/formats.html > If you did create an iterator class, would you make the > header available as a property of the iterator? I am not sure what you mean by a property of the iterator. I was thinking to simply add a field to the class. ---Michiel. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 10:38:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 10:38:52 -0400 Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like TextTools in run_tests.py In-Reply-To: Message-ID: <200807021438.m62Ecqma013815@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2524 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Documentation |Unit Tests ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:38 EST ------- Filing under "Unit Tests". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 10:39:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 10:39:22 -0400 Subject: [Biopython-dev] [Bug 2469] requires_wise.py fails on Windows (test suite) In-Reply-To: Message-ID: <200807021439.m62EdMM9013903@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2469 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Main Distribution |Unit Tests ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:39 EST ------- Filing under "Unit Tests" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 2 10:56:00 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:56:00 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <529945.38158.qm@web62404.mail.re1.yahoo.com> References: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> <529945.38158.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00807020756r4de8ed4bi3f8b409d75996a14@mail.gmail.com> >> If you did create an iterator class, would you make the >> header available as a property of the iterator? > > I am not sure what you mean by a property of the iterator. I was > thinking to simply add a field to the class. Adding the file header field to the iterator class? You could do I suppose. Right now all the Bio.SeqIO parsers use generator functions (although not in AlignIO), although I have no objection to returning iterator classes instead. I don't really like the idea of Bio.SeqIO parsers returning anything other than SeqRecord objects - even if it is indirectly via a richer iterator object. I see the Bio.SeqIO as a common unified API, and the downside is sometimes extra data doesn't really fit. If there really is some important meta-data in a file format that applies to all the records, then it cannot easily be represented in the Bio.SeqIO system except as annotation added to every single SeqRecord. e.g. Add the header to the annotations dictionary under "file-header" or something. Peter From mjldehoon at yahoo.com Wed Jul 2 11:29:31 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Jul 2008 08:29:31 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com> Message-ID: <318336.37817.qm@web62405.mail.re1.yahoo.com> --- On Wed, 7/2/08, Peter wrote: > I found an old link I had added on the wiki page for SeqIO > development, > http://pbil.univ-lyon1.fr/help/formats.html > > This clearly describes MASE format format s having > (optional) header > lines as starting with two semi colons. But are MASE and > IntelliGenetics the same thing? It may be that the link in Bio/IntelliGenetics/__init__.py actually does not pertain the the IntelliGenetics format. Except for this link (which as you point out actually talks about the MASE format, not the IntelliGenetics format), I have seen no description elsewhere of these file-wide comments preceded by a double semi-colon in the IntelliGenetics format. Even Biopython doesn't treat these consistently: The tests for Bio.IntelliGenetics include comments with the double semi-colon, but the parser doesn't treat them differently from sequence-specific comments. So let's do the following: For the IntelliGenetics parser, do not look for double semi-colon comments. Only check if the first character in a line is a semi-colon, and if so, treat it as a sequence-specific comment. This is what Bio.IntelliGenetics currently does anyway. Replace the parser class in Bio.IntelliGenetics by a generator function, and integrate it with Bio.SeqIO. Then, let's replace the IntelliGenetics tests by files that do not contain the double semi-colon comments. Does that sound OK? --Michiel. --Michiel. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 12:28:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 12:28:19 -0400 Subject: [Biopython-dev] [Bug 2535] New: Support for PIR / NBRF format in Bio.SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2535 Summary: Support for PIR / NBRF format in Bio.SeqIO Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BioPerl and EMBOSS both refer to this as the "pir" format, although EMBOSS also supports "nbrf" as an alternative. http://bioperl.org/wiki/PIR_sequence_format Patch to follow, a new parser and writer in plain python. The existing Martel based parser in Bio.NBRF could then be deprecated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 12:30:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 12:30:28 -0400 Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in Bio.SeqIO In-Reply-To: Message-ID: <200807021630.m62GUS5B025377@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2535 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 12:30 EST ------- Created an attachment (id=964) --> (http://bugzilla.open-bio.org/attachment.cgi?id=964&action=view) New file Bio/SeqIO/PirIO.py Note that the details of storing the sequence type may need tweaking for better agreement with the de-facto conventions from the GenBank parser. As part of this the following dictionary may be useful, from Bio/NBRF/ValSeq.py valid_sequence_dict = { "P1": "complete protein", "F1": "protein fragment", \ "DL": "linear DNA", "DC": "circular DNA", "RL": "linear RNA", \ "RC":"circular RNA", "N3": "transfer RNA", "N1": "other" } -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 13:37:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 13:37:05 -0400 Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in Bio.SeqIO In-Reply-To: Message-ID: <200807021737.m62Hb5lX031417@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2535 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 13:37 EST ------- My patch doesn't accept the "N1" sequence type mentioned in Bio/NBRF/ValSeq.py Also when recording a SeqRecord from a non-PIR input, we could try and guess the sequence type. The alphabet itself is one clue. GenBank and EMBL files should also record if the sequence is linear or circular, as well as a sequence type. For proteins, I don't see how to decide between P1 and F1 though (complete protein vs protein fragment). Maybe default to F1? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 15:51:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 15:51:49 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807021951.m62Jpnx3012202@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 ------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-02 15:51 EST ------- Even better docs: http://blog.doughellmann.com/2007/07/pymotw-subprocess.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 09:24:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 09:24:32 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807031324.m63DOWDA018278@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 09:24 EST ------- Hi Frank, I see you've updated Bio/Nexus/Nexus.py with CVS revision 1.16 to record the original taxon order with and without the name changes. n.unaltered_taxlabels = Original names in order with duplicates n.original_taxon_order = Modified names in order, suitable as keys to n.matrix I'll update Bio.SeqIO / Bio.AlignIO to take advantage of this shortly, storing the original name and the modified unique name as the SeqRecord's name and id properties. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 09:52:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 09:52:08 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807031352.m63Dq8el021720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #17 from fkauff at biologie.uni-kl.de 2008-07-03 09:52 EST ------- Hi Peter, I'd strongly suggest to use self.taxlabels instead of self.original_taxon_order. The latter is only for compatibility, and original_taxon_order just maps taxlabels. Actually it might make sense to give a deprecation warning if original_taxon_order is used, and it should be removed in some future release. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 10:06:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 10:06:46 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807031406.m63E6kct023377@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #584 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:06 EST ------- (From update of attachment 584) With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an empty Nexus object and add sequences to it. This code it now obsolete. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 10:13:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 10:13:38 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807031413.m63EDcGj024034@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:13 EST ------- Created an attachment (id=965) --> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) Bio/Nexus/Nexus.py handle support in write_nexus_data() With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an empty Nexus object and add sequences to it: #Read in an alignment object, e.g. with Bio.AlignIO from Bio import AlignIO alignment = AlignIO.read(open("example.aln"), "clustal") #Make a Nexus object from Bio.Nexus import Nexus handle = open("test.nex", "w") n = Nexus.Nexus() n.alphabet = alignment._alphabet for record in alignment : n.add_sequence(record.id, record.seq.tostring()) n.write_nexus_data(handle) handle.close() There are two problems with write_nexus_data(), firstly it doesn't accept a StringIO handle (see also Bug 2454). Secondly, if given a handle it closes it. This would break the above code, or how I typically use StringIO. This patch addresses these points. Frank, are you happy for me to commit this change? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 12:02:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 12:02:30 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807031602.m63G2Unc032671@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:02 EST ------- Created an attachment (id=966) --> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view) Patch to Bio/AlignIO/NexusIO.py adding write support This patch requires the Bio.Nexus handle fix (patch in attachment 965, comment 4). My method for constructing an empty DNA, RNA, or Protein Nexus object is perhaps inelegant. This is required in order to setup the alphabet, ambiguous_values and unambiguous_letters properties which otherwise default to DNA. Also note that the Nexus add_sequence() method does not seem to support duplicated taxa names. Perhaps this method could update the unaltered_taxlabels property and use the _unique_label method to cope with duplicate names? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 12:08:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 12:08:26 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807031608.m63G8QS3000534@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:08 EST ------- I have changed my use of original_taxon_order to just taxlabels (code now in Bio/AlignIO/NexusIO.py rather than Bio/SeqIO/NexusIO.py). I agree, adding a deprecation warning to the original_taxon_order get/set functions would make sense. P.S. Thanks for adding the unaltered_taxlabels property. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Jul 4 04:11:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Jul 2008 09:11:06 +0100 Subject: [Biopython-dev] What happened to Biopython 1.46? Message-ID: <320fb6e00807040111h182411d5lea14575f2906e7ba@mail.gmail.com> We were recently talking about doing another release, but as you may have noticed nothing has been announced. Michiel devoted a good chunk of his weekend to preparing Biopython 1.46 and uploaded it to the servers on Sunday 29th. He didn't issue an announcement email at the time due to the problem with the wiki being read only (now fixed). However, on the Monday evening I realised I'd done something really stupid in Bio.Data.CodonTable just before the CVS freeze. Table 15 (Blepharisma Macronuclear) would be used whenever a translation table was requested by name. This change has been reverted, and I've added further translation checks in test_seq.py to avoid any similar issue in future. So, while there is a Biopython 1.46, we're not going to advertise it because the translation functionality is subtly wrong. However, it is up on the website, and linked to with an errata statement. Michiel will kindly try and prepare Biopython 1.47 soon... so please hold off any big changes in CVS until then. And I'm hearby publicly promising to treat him to dinner - hopefully we'll be in the same country at the same time this year! Peter From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:39:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 04:39:35 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040839.m648dZnX025882@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #6 from fkauff at biologie.uni-kl.de 2008-07-04 04:39 EST ------- (In reply to comment #4) > Created an attachment (id=965) --> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details] > Bio/Nexus/Nexus.py handle support in write_nexus_data() > > With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an > empty Nexus object and add sequences to it: > ... > There are two problems with write_nexus_data(), firstly it doesn't accept a > StringIO handle (see also Bug 2454). > > Secondly, if given a handle it closes it. This would break the above code, or > how I typically use StringIO. > > This patch addresses these points. > > Frank, are you happy for me to commit this change? > Very nice. Go for it :-) Cheers, Frank (In reply to comment #4) > Created an attachment (id=965) --> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details] > Bio/Nexus/Nexus.py handle support in write_nexus_data() > > With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an > empty Nexus object and add sequences to it: > > #Read in an alignment object, e.g. with Bio.AlignIO > from Bio import AlignIO > alignment = AlignIO.read(open("example.aln"), "clustal") > > #Make a Nexus object > from Bio.Nexus import Nexus > handle = open("test.nex", "w") > n = Nexus.Nexus() > n.alphabet = alignment._alphabet > for record in alignment : > n.add_sequence(record.id, record.seq.tostring()) > n.write_nexus_data(handle) > handle.close() > > There are two problems with write_nexus_data(), firstly it doesn't accept a > StringIO handle (see also Bug 2454). > > Secondly, if given a handle it closes it. This would break the above code, or > how I typically use StringIO. > > This patch addresses these points. > > Frank, are you happy for me to commit this change? > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:53:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 04:53:10 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040853.m648rAHL026783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #965 is|0 |1 obsolete| | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:53 EST ------- (From update of attachment 965) > > This patch addresses these points. > > > > Frank, are you happy for me to commit this change? > > > > Very nice. Go for it :-) > Thanks Frank. Checking in Nexus.py; /home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py new revision: 1.17; previous revision: 1.16 done Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:56:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 04:56:10 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040856.m648uAAG027012@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #966 is|0 |1 obsolete| | ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:56 EST ------- (From update of attachment 966) There is slight problem with this patch on the alphabet selection (it uses "dna" when it should use "rna"). I postpone dealing with writing Nexus files in Bio.SeqIO / Bio.AlignIO until after the next Biopython release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 05:13:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 05:13:25 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040913.m649DPap027929@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #9 from fkauff at biologie.uni-kl.de 2008-07-04 05:13 EST ------- (In reply to comment #5) > Created an attachment (id=966) --> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view) [details] > Patch to Bio/AlignIO/NexusIO.py adding write support > > This patch requires the Bio.Nexus handle fix (patch in attachment 965 [details], comment > 4). > > My method for constructing an empty DNA, RNA, or Protein Nexus object is > perhaps inelegant. This is required in order to setup the alphabet, > ambiguous_values and unambiguous_letters properties which otherwise default to > DNA. > > Also note that the Nexus add_sequence() method does not seem to support > duplicated taxa names. Perhaps this method could update the > unaltered_taxlabels property and use the _unique_label method to cope with > duplicate names? > Ok, I updated add_sequence and will commit the changes soon. F -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 05:20:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 05:20:07 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040920.m649K7MI028352@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #10 from fkauff at biologie.uni-kl.de 2008-07-04 05:20 EST ------- (In reply to comment #9) > > > > Also note that the Nexus add_sequence() method does not seem to support > > duplicated taxa names. Perhaps this method could update the > > unaltered_taxlabels property and use the _unique_label method to cope with > > duplicate names? > > > Ok, I updated add_sequence and will commit the changes soon. > Checking in biopython/Bio/Nexus/Nexus.py; /home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py new revision: 1.18; previous revision: 1.17 Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Fri Jul 4 06:24:06 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 4 Jul 2008 03:24:06 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com> Message-ID: <36286.77119.qm@web62412.mail.re1.yahoo.com> > I'm assuming we'd put the new IntelliGenetics to > SeqRecord parser in Bio/SeqIO/IgIO.py (based on > the format name of "ig" used in EMBOSS). OK. > Would we then also deprecate Bio.IntelliGenetics? Yes. Otherwise, it's replicated functionality. > Do you want to make these changes, or should I? Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead. If you prefer me to do it, please let me know. > > Then, let's replace the IntelliGenetics tests by > files that do not contain the double > > semi-colon comments. > > Why not just leave the double colon lines alone? The parser > should be able to cope. In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but if we'd include them with the sequence-specific comments we'd be misrepresenting the file. --Michiel. --- On Wed, 7/2/08, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio.IntelliGenetics > To: mjldehoon at yahoo.com > Date: Wednesday, July 2, 2008, 12:11 PM > > It may be that the link in > Bio/IntelliGenetics/__init__.py actually does not pertain > to > > the IntelliGenetics format. Except for this link > (which as you point out actually talks > > about the MASE format, not the IntelliGenetics > format), I have seen no description > > elsewhere of these file-wide comments preceded by a > double semi-colon in the > > IntelliGenetics format. Even Biopython doesn't > treat these consistently: The tests > > for Bio.IntelliGenetics include comments with the > double semi-colon, but the parser > > doesn't treat them differently from > sequence-specific comments. > > Maybe we should ask BioPerl if they distinguish between the > IntelliGenetics and MASE formats? > > Looking back over the old mailing list, at the time they > did think the > two were the same: > http://lists.open-bio.org/pipermail/biopython-dev/2001-October/000626.html > > > So let's do the following: > > For the IntelliGenetics parser, do not look for double > semi-colon comments. Only check > > if the first character in a line is a semi-colon, and > if so, treat it as a sequence-specific > > comment. This is what Bio.IntelliGenetics currently > does anyway. > > Replace the parser class in Bio.IntelliGenetics by a > generator function, and integrate it with > > Bio.SeqIO. > > I'm assuming we'd put the new IntelliGenetics to > SeqRecord parser in > Bio/SeqIO/IgIO.py (based on the format name of > "ig" used in EMBOSS). > Would we then also deprecate Bio.IntelliGenetics? > > Do you want to make these changes, or should I? > > > Then, let's replace the IntelliGenetics tests by > files that do not contain the double > > semi-colon comments. > > Why not just leave the double colon lines alone? The parser > should be > able to cope. > > Peter From biopython at maubp.freeserve.co.uk Fri Jul 4 10:31:55 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Jul 2008 15:31:55 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <36286.77119.qm@web62412.mail.re1.yahoo.com> References: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com> <36286.77119.qm@web62412.mail.re1.yahoo.com> Message-ID: <320fb6e00807040731h787c66e6t10a4edd31dffdbc2@mail.gmail.com> >> Do you want to make these changes, or should I? > > Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead. OK. I've added a simple parser to CVS as Bio/SeqIO/IgIO.py for IntelliGenetics/MASE files using the format name "ig" to match EMBOSS. The existing three sample files are now being used in test_SeqIO.py and one of them also in test_AlignIO.py as well. If anyone wants to scan over the code, I'd be delighted to have feedback. Adding support for writing these files should be easy. Do you think this is worth implementing? Before we deprecate Bio.IntelliGenetics I suggest we ask on the mailing list if anyone is using it. > In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation > from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but > if we'd include them with the sequence-specific comments we'd be misrepresenting the file. I am ignoring the ";;" lines at the start of the file. Peter From mjldehoon at yahoo.com Sat Jul 5 04:24:41 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Jul 2008 01:24:41 -0700 (PDT) Subject: [Biopython-dev] CVS freeze for release 1.47 Message-ID: <223850.14172.qm@web62404.mail.re1.yahoo.com> Hi everybody, I'll start on release 1.47 from now, so please don't make any commits to CVS until the release is out. Thanks! --Michiel. From mjldehoon at yahoo.com Sat Jul 5 20:00:17 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Jul 2008 17:00:17 -0700 (PDT) Subject: [Biopython-dev] Biopython release 1.47 Message-ID: <287726.364.qm@web62412.mail.re1.yahoo.com> We are pleased to announce the release of Biopython 1.47. This release includes a new Bio.AlignIO module, updates to Bio.Blast, parsers for NCBI's Entrez E-Utilities, numerous other code improvements and fixes, and an extended and updated documentation. In particular if you use Biopython to access NCBI's E-Utilities, we encourage you to download and install this release to ensure full compliance with NCBI's access rules. Source distributions and Windows installers are available from the Biopython website at http://biopython.org. My thanks to all code contributers who made this new release possible. --Michiel on behalf of the Biopython developers From sbassi at gmail.com Sun Jul 6 15:53:54 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Sun, 6 Jul 2008 16:53:54 -0300 Subject: [Biopython-dev] NCBIStandalon not compatible with previous versions, is this a bug? Message-ID: NCBIStandalone changed in 1.46 due to bug #2508. So this code that was working before, no longer works: result, err = NCBIStandalone.blastall(b_exe, "blastn", b_db, f_name, expectation=1e-10, descriptions=1) The error trace is: File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py", line 1986, in _security_check_parameters if ";" in value or "&&" in value : TypeError: argument of type 'float' is not iterable So I had to rewrite the code as: result, err = NCBIStandalone.blastall(b_exe, "blastn", b_db, f_name, expectation="1e-10", descriptions="1") The problem is the function "_security_check_parameters", that assumes that all arguments are strings. Proposed solutions: 1) Leave it as is (this is not a bug). Some tutorial should be changed (?) 2) Modify line 1986 from: if ";" in value or "&&" in value : To this: if ";" in value or "&&" in str(value) : From bugzilla-daemon at portal.open-bio.org Mon Jul 7 06:47:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 06:47:48 -0400 Subject: [Biopython-dev] [Bug 2447] EUtils cannot parse PubMed XML for ACS journals In-Reply-To: Message-ID: <200807071047.m67Almjb027271@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2447 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:47 EST ------- Using Biopython release 1.47; Bio.Entrez can parse the XML for this PMID: >>> from Bio import Entrez >>> PMID = "17238260" >>> handle = Entrez.efetch(db='pubmed', id=PMID, retmode='xml') >>> record = Entrez.read(handle) >>> Noel, can you use Bio.Entrez instead of Bio.EUtils? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 7 06:55:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 06:55:10 -0400 Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author names In-Reply-To: Message-ID: <200807071055.m67AtAWu027543@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2448 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:55 EST ------- Using Bio.Entrez in Biopython release 1.47: >>> from Bio import Entrez >>> handle = Entrez.efetch(db='pubmed', id=pmids, retmode='xml') >>> records = Entrez.read(handle) >>> records[0]['MedlineCitation']['Article']['AuthorList'] [{u'LastName': 'Matamala', u'Initials': 'AR', u'ForeName': 'Adelio R'}, {u'LastName': 'Almonacid', u'Initials': 'DE', u'ForeName': 'Daniel E'}, {u'LastName': 'Figueroa', u'Initials': 'MF', u'ForeName': 'Maximiliano F'}, {u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName': u'Jos\xe9'}, {u'LastName': 'Bunster', u'Initials': 'MC', u'ForeName': 'Marta C'}] Noel, is this sufficient for your needs? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 7 07:12:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 07:12:26 -0400 Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author names In-Reply-To: Message-ID: <200807071112.m67BCQAB028433@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2448 ------- Comment #3 from baoilleach at gmail.com 2008-07-07 07:12 EST ------- Thanks Michiel, but I found a workaround a day later so don't worry about me. I was just letting you know about the bug... Noel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Jul 7 09:07:24 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Jul 2008 14:07:24 +0100 Subject: [Biopython-dev] NCBIStandalon not compatible with previous versions, is this a bug? In-Reply-To: References: Message-ID: <320fb6e00807070607m2cee88b1n9b2b2194d96c3c12@mail.gmail.com> On Sun, Jul 6, 2008 at 8:53 PM, Sebastian Bassi wrote: > NCBIStandalone changed in 1.46 due to bug #2508. > So this code that was working before, no longer works: > > result, err = NCBIStandalone.blastall(b_exe, "blastn", > b_db, f_name, expectation=1e-10, descriptions=1) > > The error trace is: > > File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py", > line 1986, in _security_check_parameters > if ";" in value or "&&" in value : > TypeError: argument of type 'float' is not iterable > > So I had to rewrite the code as: > > result, err = NCBIStandalone.blastall(b_exe, "blastn", > b_db, f_name, expectation="1e-10", descriptions="1") > > The problem is the function "_security_check_parameters", that assumes > that all arguments are strings. > > Proposed solutions: > > 1) Leave it as is (this is not a bug). Some tutorial should be changed (?) > 2) Modify line 1986 from: > if ";" in value or "&&" in value : > To this: > if ";" in value or "&&" in str(value) : I would say its a bug, and casting into a string on line 1986 looks like the best fix. I won't be able to do this until tomorrow afternoon at the latest - if you could file a bug that would be helpful in case I forget ;) Thanks Peter From bugzilla-daemon at portal.open-bio.org Mon Jul 7 13:08:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 13:08:40 -0400 Subject: [Biopython-dev] [Bug 2538] New: _security_check_parameters assumes all arguments are strings Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2538 Summary: _security_check_parameters assumes all arguments are strings Product: Biopython Version: 1.46 Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com This code no longer works: result, err = NCBIStandalone.blastall(b_exe, "blastn", b_db, f_name, expectation=1e-10, descriptions=1) Because new _security_check_parameters function assumes all blastall parameters are string, but expectation and descriptions are float and int. Proposed fix: Modify line 1986 from: if ";" in value or "&&" in value : To this: if ";" in value or "&&" in str(value) : -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at gmail.com Mon Jul 7 16:30:14 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 7 Jul 2008 17:30:14 -0300 Subject: [Biopython-dev] Alignment problem. bug? Message-ID: I would like to confirm that this is a bug ot not. If I get confirmation, I would fill it in bugzilla. With this code: from Bio import Clustalw from Bio.Clustalw import MultipleAlignCL cline = MultipleAlignCL('foralig.txt') cline.set_output("alig.aln") alignment = Clustalw.do_alignment(cline) I get: Traceback (most recent call last): File "/mnt/hda2/py252/bin/ii.py", line 112, in alignment = Clustalw.do_alignment(cline) File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 125, in do_alignment return parse_file(out_file, alphabet) File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 47, in parse_file generic_alignment = AlignIO.read(handle, "clustal") File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py", line 299, in read first = iterator.next() File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py", line 169, in next raise ValueError("Could not parse line:\n%s" % line) ValueError: Could not parse line: I tested with Biopython 1.47 and 1.46 with the input file: http://www.pastecode.com.ar/f44f28b41 (download at http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41) The clustal program is running because I see in the disk its output (posted here: http://www.pastecode.com.ar/f275a5475). It seems it fails to parse it. I also tested in an older version (I guess it is 1.44) and it works OK. So I think the problem was introduced in 1.46. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From bugzilla-daemon at portal.open-bio.org Tue Jul 8 04:41:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Jul 2008 04:41:02 -0400 Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all arguments are strings In-Reply-To: Message-ID: <200807080841.m688f2VL020100@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2538 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:41 EST ------- Included a float in the unit test for _security_check_parameters() added in Bug 2508: Tests/test_NCBIStandalone.py revision: 1.15; Fixed the string assumption in: Bio/Blast/NCBIStandalone.py revision: 1.74; Note that in your suggested fix Sebastian, both the "in" expressions need casting to a string. Thanks for reporting this! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Jul 8 04:51:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 09:51:31 +0100 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: References: Message-ID: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote: > I would like to confirm that this is a bug ot not. If I get > confirmation, I would fill it in bugzilla. It does look like a bug to me... > With this code: > > from Bio import Clustalw > from Bio.Clustalw import MultipleAlignCL > > cline = MultipleAlignCL('foralig.txt') > cline.set_output("alig.aln") > alignment = Clustalw.do_alignment(cline) > > I get: > > Traceback (most recent call last): > File "/mnt/hda2/py252/bin/ii.py", line 112, in > alignment = Clustalw.do_alignment(cline) > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", > line 125, in do_alignment > return parse_file(out_file, alphabet) > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", > line 47, in parse_file > generic_alignment = AlignIO.read(handle, "clustal") > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py", > line 299, in read > first = iterator.next() > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py", > line 169, in next > raise ValueError("Could not parse line:\n%s" % line) > ValueError: Could not parse line: > > > I tested with Biopython 1.47 and 1.46 with the input file: > http://www.pastecode.com.ar/f44f28b41 (download at > http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41) > The clustal program is running because I see in the disk its output > (posted here: http://www.pastecode.com.ar/f275a5475). It seems it > fails to parse it. > > I also tested in an older version (I guess it is 1.44) and it works > OK. So I think the problem was introduced in 1.46. For Biopython 1.46+ I switched the Bio.Clustalw parser to internally call Bio.AlignIO, so one thing you could try is reverting Bio/Clustalw/__init__.py to the older version (e.g. that shipped with Biopython 1.45). You haven't said which version of the ClustalW tool you are using - maybe 2.0? If so, there could be a subtle change in the output format since 1.83. If you could run the tool by hand and share the output that would be helpful to try and track down this issue. I don't seem to have any version of ClustalW installed on my current machine, so it will take me a little longer to reproduce this here. Could you file a bug please, and attach the example input and the output when run by hand at the command line. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Tue Jul 8 04:52:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Jul 2008 04:52:06 -0400 Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all arguments are strings In-Reply-To: Message-ID: <200807080852.m688q6Ce020588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2538 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:52 EST ------- Forgot to mark this as fixed - sorry for the extra email! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Jul 8 07:02:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 12:02:37 +0100 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> Message-ID: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> On Tue, Jul 8, 2008 at 9:51 AM, Peter wrote: > On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote: >> I would like to confirm that this is a bug ot not. If I get >> confirmation, I would fill it in bugzilla. > > It does look like a bug to me... I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal files where the first line of the consensus was blank (and would probably affect both Clustal W 1.83 too). I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 Could you update this file and re-test please Sebastian? Also, may I add a test alignment file based on your example to CVS please? Thanks, Peter From mjldehoon at yahoo.com Tue Jul 8 08:47:48 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 8 Jul 2008 05:47:48 -0700 (PDT) Subject: [Biopython-dev] Bio.Sequencing Message-ID: <570915.67657.qm@web62415.mail.re1.yahoo.com> Hi everybody, Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps? I'd like to make some changes to Bio.Sequencing with regards to bug #2454: http://bugzilla.open-bio.org/show_bug.cgi?id=2454 Just to make sure that I am not treading on other people's work. --Michiel From fkauff at biologie.uni-kl.de Tue Jul 8 09:12:39 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Tue, 08 Jul 2008 15:12:39 +0200 Subject: [Biopython-dev] Bio.Sequencing In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com> References: <570915.67657.qm@web62415.mail.re1.yahoo.com> Message-ID: <487367C7.2050702@biologie.uni-kl.de> Hi all, Michiel de Hoon wrote: > Hi everybody, > > Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps? > Not me. Green lights from my side. Frank > I'd like to make some changes to Bio.Sequencing with regards to bug #2454: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2454 > > Just to make sure that I am not treading on other people's work. > > > --Michiel > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From biopython at maubp.freeserve.co.uk Tue Jul 8 10:36:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 15:36:43 +0100 Subject: [Biopython-dev] Bio.Sequencing In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com> References: <570915.67657.qm@web62415.mail.re1.yahoo.com> Message-ID: <320fb6e00807080736v26388f2ake12303c5b752c5e9@mail.gmail.com> On Tue, Jul 8, 2008 at 1:47 PM, Michiel de Hoon wrote: > Hi everybody, > > Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps? > I'd like to make some changes to Bio.Sequencing with regards to bug #2454: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2454 > > Just to make sure that I am not treading on other people's work. My only comment is watch out for the fact that Bio.SeqIO is now calling Bio.Sequencing for the "ace" and "phd" formats. On a related note, I'd had some ideas for making the Ace parser more user friendly by further extending the doc strings and defining __str__ or __repr__ methods for some of the "line type classes" which otherwise must be explored by using dir() to discover the properties. I haven't actually done any work on this yet though. Peter From sbassi at gmail.com Tue Jul 8 11:38:29 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Jul 2008 12:38:29 -0300 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> Message-ID: On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote: > I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with > Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal > files where the first line of the consensus was blank (and would > probably affect both Clustal W 1.83 too). Yes, I used ClustalW 1.83 > I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 > Could you update this file and re-test please Sebastian? Also, may I > add a test alignment file based on your example to CVS please? Ok, I will test it today. You can use my file or any possible derivation of it. Best, SB. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From biopython at maubp.freeserve.co.uk Tue Jul 8 11:56:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 16:56:20 +0100 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> Message-ID: <320fb6e00807080856s55d77962h9ceedd160ca8002b@mail.gmail.com> >> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 >> Could you update this file and re-test please Sebastian? Also, may I >> add a test alignment file based on your example to CVS please? > > Ok, I will test it today. You can use my file or any possible derivation of it. Thanks - I have added a two sequence version of your example as Tests/Clustalw/odd_consensus.aln Peter From sbassi at gmail.com Tue Jul 8 12:52:13 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Jul 2008 13:52:13 -0300 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> Message-ID: On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote: > I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 Just to confirm that it works now. Thank you! Best, SB. From biopython at maubp.freeserve.co.uk Wed Jul 9 07:11:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Jul 2008 12:11:16 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO In-Reply-To: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com> References: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com> Message-ID: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com> Now that Biopython 1.47 is out, are there any comments/objections to my committing this to CVS? Bug 2533 - Support for simple "tab" format in Bio.SeqIO http://bugzilla.open-bio.org/show_bug.cgi?id=2533 Thanks, Peter P.S. Any real world example files would be good for the test suite. From lpritc at scri.ac.uk Wed Jul 9 08:14:04 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 09 Jul 2008 13:14:04 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO In-Reply-To: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com> Message-ID: Only that you might want to consider Axon Text File format as a self-describing tab-separated format which would facilitate storage and recovery of all attributes of a sequence. There's a spec here: http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html On 09/07/2008 12:11, "Peter" wrote: > Now that Biopython 1.47 is out, are there any comments/objections to > my committing this to CVS? > > Bug 2533 - Support for simple "tab" format in Bio.SeqIO > http://bugzilla.open-bio.org/show_bug.cgi?id=2533 > > Thanks, > > Peter > > P.S. Any real world example files would be good for the test suite. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Dr Leighton Pritchard B.Sc.(Hons) MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). From biopython at maubp.freeserve.co.uk Wed Jul 9 08:30:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Jul 2008 13:30:26 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO In-Reply-To: References: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com> Message-ID: <320fb6e00807090530j43a3e2c9y48bef4993587881f@mail.gmail.com> On Wed, Jul 9, 2008 at 1:14 PM, Leighton Pritchard wrote: > Only that you might want to consider Axon Text File format as a > self-describing tab-separated format which would facilitate storage and > recovery of all attributes of a sequence. There's a spec here: > > http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html > Its an interesting and flexible file format, but I don't see any standard column name for "sequence" which would be of particular interest from the point of view of the Bio.SeqIO module. If there is a de-facto convention for storing sequence data in an Axon Text File, then we could adopt this within Bio.SeqIO. Otherwise, I think any Axon Text File parser added to Biopython would have to be of much more general nature (and not part of Bio.SeqIO). Peter From biopython at maubp.freeserve.co.uk Wed Jul 9 09:03:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Jul 2008 14:03:16 +0100 Subject: [Biopython-dev] Simple __getitem__ for Alignments Message-ID: <320fb6e00807090603o6b087ceeuce0b87c13627552a@mail.gmail.com> Now that the latest release is out (Biopython 1.47), Bio.AlignIO should start to get used more. I anticipate more people getting frustrated with the current Alignment object, and would like to make another baby-step in improving it. I'd like to add a minimal __getitem__ method, as described in Bug 1944 comment 15, http://bugzilla.open-bio.org/show_bug.cgi?id=1944#c15 > def __getitem__(self, index) : > """Access part of the alignment. > > You can access a row of the alignment as a SeqRecord using an integer > index (think of the alignment as a list of SeqRecord objects here): > > first_record = my_alignment[0] > last_record = my_alignment[-1] > > Right now, this is the ONLY indexing operation supported. The > use of two indices and splice notation to extract a sub-alignment, > row, column or letter is under discussion for a future update.""" > if isinstance(index, int) : > #e.g. result = align[x] > #Return a SeqRecord > return self._records[index] > else : > raise TypeError, "Not currently supported, but may be in future." >From the discussion on Bug 1944, this doesn't seem to be contentious - the debate is about more advanced splicing operations. I'd also like to add a __len__ method which would return the number of SeqRecord objects (i.e. the number of rows). This would then let the alignment be treated very much like a read-only list of SeqRecord objects. Remember, we can already iterate over the rows in the alignment as SeqRecord objects. Any comments? Peter From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:21:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 09:21:13 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091321.m69DLD9g031282@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 09:21 EST ------- (In reply to comment #16) I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free to have a look and comment. If everybody is OK, I'll add a DeprecationWarning to the previous parser. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:37:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 09:37:44 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091337.m69DbiM5031944@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #21 from fkauff at biologie.uni-kl.de 2008-07-09 09:37 EST ------- Michiel, while you're at it - could you update my email in the source as well? And Cymon's email is now cy at cymon.org. Thanks! Frank (In reply to comment #20) > (In reply to comment #16) > I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free > to have a look and comment. If everybody is OK, I'll add a DeprecationWarning > to the previous parser. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:38:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 09:38:18 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091338.m69DcIDu031986@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 09:38 EST ------- In reply to comment 20 about the updates to Bio.Sequencing.PhD I see you've also update Bio.SeqIO.PhdIO in CVS (good). I would suggest you add yourself to the copyright statement for this module, and add some doc string entries to the new read and parse functions. I haven't looked over the details of the new code (other than confirming test_Phd.py and test_SeqIO.py seem happy). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 10:28:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 10:28:36 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091428.m69ESaGm001621@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #23 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 10:28 EST ------- (In reply to comment #21) > Michiel, > > while you're at it - could you update my email in the source as well? And > Cymon's email is now I have updated your address, but I'd prefer hold off on Cymon's without his direct permission -- spammers are watching too, you know. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:33:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 14:33:42 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807091833.m69IXgcV013783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:33 EST ------- OK, so my old code not yet converted to biopython-1.47 gives me: _textframe = blast.blast_and_htmlize(_query_sequence, _usermode, upload_temp_path, blast_path, uri, _align_view, _matrix) File "/home/mmokrejs/public_html/IRES2/blast.py", line 548, in blast_and_htmlize _blast_out, _error_info, _blast_file = blastall(blast_path + targetdb, query_sequence, upload_temp_path, mode='sequence', align_view=align_view, matrix=matrix) File "/home/mmokrejs/public_html/IRES2/blast.py", line 506, in blastall _blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall', 'blastn', blast_db, _blast_file, matrix=matrix + ' -F 0', wordsize=_wordsize, gap_open=_gap_open, gap_extend=_gap_extend, strands=_strands, alignments=_alignments, descriptions=_descriptions, expectation=_expectation, align_view=align_view) File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1620, in blastall _security_check_parameters(keywds) File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1986, in _security_check_parameters if ";" in value or "&&" in value : TypeError: argument of type 'int' is not iterable It turns out I am passing in: {'matrix': 'NUC.4.4 -F 0', 'strands': 3, 'expectation': 100, 'wordsize': 4, 'gap_extend': 1, 'gap_open': 1, 'alignments': 99999, 'descriptions': 9999} I don't think it makes sense to require users to pass strings instead of numbers to the function. While looking into the _security_check_parameters() I think you should also check for "||" - the logical OR as interpreted by shell and redirections ">" and "<". FIX: -if ";" in value or "&&" in value: +if ";" in str(value) or "&&" in str(value) or "||" in str(value) or ">" in str(value) or "<" in str(value): My apologies that I did not test earlier. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:38:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 14:38:08 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807091838.m69Ic82k014070@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 ------- Comment #11 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:38 EST ------- Don't know if you want to leave in the back-door to pass in another argument with its value. If not, prevent spaces as well. Values never contain spaces unless wrapped by single or double-quotes. I find it perfectly legal to tell blastall: -d "/some/db /another/db /yet/another" to search over three databases at once. It seems it does not reflect -d specified 3 times on its command-line. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 16:12:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 16:12:40 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807092012.m69KCeO2018087@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 16:12 EST ------- The issue with non-string arguments (e.g. float or integers) was reported by by Sebastian Bassi (Bug 2538) and has since been fixed in CVS - sadly this was after the release of Biopython 1.47. As you've demonstrated there are valid reasons to want to include spaces. I would rather not add a check which requires lots of special casing. I'm leaving this bug open to consider extending _security_check_parameters() to prevent the use of pipes and redirection (i.e. "|", "<" and ">") which sounds reasonable. A third opinion wouldn't hurt of course! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 10 06:30:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Jul 2008 06:30:28 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807101030.m6AAUSew025300@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #24 from fkauff at biologie.uni-kl.de 2008-07-10 06:30 EST ------- > (In reply to comment #21) > > Michiel, > > > > while you're at it - could you update my email in the source as well? And > > Cymon's email is now > > I have updated your address, but I'd prefer hold off on Cymon's without his > direct permission -- spammers are watching too, you know. > Contacted Cymon, reply below: Hi Frank, ... > > Do you want your email address updated in the ace/phd parser code? Or > removed (just the email, not the name, of course)? Don't know if you follow > biopython-dev lately. I dont actually follow the -dev list but perhaps I should, as I think I'm going to be using and doing far more diverse bioinformatics stuff (now that I'm employed as a bioinformatician :) Anyway, the email can be changed to cymon.cox at gmail.com - best to go through google I think as their spam filters tend to be pretty good. Cheers, C. (In reply to comment #23) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 10 12:24:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Jul 2008 12:24:27 -0400 Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in Bio.SeqIO In-Reply-To: Message-ID: <200807101624.m6AGORlL012526@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2533 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-10 12:24 EST ------- Checked in, marking as fixed. Bio/SeqIO/TabIO.py initial revision: 1.1 Bio/SeqIO/__init__.py new revision: 1.33 Tests/output/test_SeqIO new revision: 1.25 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 10 23:20:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Jul 2008 23:20:11 -0400 Subject: [Biopython-dev] [Bug 2542] New: AlignInfo.py fails a test Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2542 Summary: AlignInfo.py fails a test Product: Biopython Version: 1.46 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com When I run: $ python2.5 /mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py I get the first 2 test OK but then: Traceback (most recent call last): File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 723, in print summary.information_content() File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 508, in information_content raise ValueError, errstr ValueError: Error in alphabet: not Nucleotide or Protein, supply expected frequencies I've also tried without the AlignIO: from Bio import Alphabet from Bio.Align.Generic import Alignment from Bio.Seq import Seq from Bio.Align.AlignInfo import SummaryInfo seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW' seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW' a = Alignment(Alphabet.ProteinAlphabet) a.add_sequence("asp",seq1) a.add_sequence("unk",seq2) summary = SummaryInfo(a) summary.information_content() Traceback (most recent call last): File "/mnt/hda2/py252/bin/align.py", line 16, in summary.information_content() File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py", line 508, in information_content raise ValueError, errstr ValueError: Error in alphabet: not Nucleotide or Protein, supply expected frequencies -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 04:49:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 04:49:08 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807110849.m6B8n8Xg022720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 04:49 EST ------- Going over your example code: >>> from Bio import Alphabet >>> from Bio.Align.Generic import Alignment >>> from Bio.Align.AlignInfo import SummaryInfo >>> seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW' >>> seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW' >>> a = Alignment(Alphabet.ProteinAlphabet) First problem, you gave the Alignment object an Alphabet class, rather than an instance of the class. I guess we should an explicit check to the Alignment object... You should have used: >>> a = Alignment(Alphabet.ProteinAlphabet()) Or, if you prefer perhaps: >>> a = Alignment(Alphabet.generic_protein) Then when we get to the information_content, there is another issue: >>> a.add_sequence("asp",seq1) >>> a.add_sequence("unk",seq2) >>> summary = SummaryInfo(a) >>> summary.information_content() Traceback (most recent call last): ... AttributeError: ProteinAlphabet instance has no attribute 'gap_char' The trouble here is that SummaryInfo class is looking for a declared gap character in the protein alphabet - and none has been declared. Your example sequences appear to use "-" as a gap, but you haven't declared this. Try this: from Bio import Alphabet from Bio.Align.Generic import Alignment from Bio.Seq import Seq from Bio.Align.AlignInfo import SummaryInfo seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW' seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW' a = Alignment(Alphabet.Gapped(Alphabet.generic_protein, "-")) a.add_sequence("asp",seq1) a.add_sequence("unk",seq2) summary = SummaryInfo(a) print summary.information_content() You mentioned having a similar issue with Bio.AlignIO - could you attached the example file to this bug with some trivial code showing your problem? Thanks, Peter. P.S. Please update to Biopython 1.47 rather than using 1.46 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 05:50:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 05:50:49 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807110950.m6B9on7t025902@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 05:50 EST ------- I think I've fixed the "Quick test" failure when running Bio/Align/AlignInfo.py directly. I don't know how I missed that before... /home/repository/biopython/biopython/Bio/Align/AlignInfo.py,v <-- AlignInfo.py new revision: 1.15; previous revision: 1.14 done My opinion from from looking at the AlignInfo code, and scanning back over the CVS history, is that it was ever used much with generic alphabets (as tend to be returned by Bio.AlignIO). There may be other issues here - for example I've spotted another problem case, doubly extended alphabets like a protein alphabet with declared Gapped and WithStopCodon (which you *might* want in an alignment). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Jul 11 06:33:22 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 11 Jul 2008 11:33:22 +0100 Subject: [Biopython-dev] Checking alphabet argument in alignments Message-ID: <320fb6e00807110333r1938510bne7e24d1ce7b5c0b@mail.gmail.com> I'd like to add the following check to the __init__ method of the Bio.Align.Generic.Alignment object (our base alignment class), > if not (isinstance(alphabet, Alphabet.Alphabet) \ > or isinstance(alphabet, Alphabet.AlphabetEncoder)): > raise ValueError("Invalid alphabet argument") This will prevent subtle user errors like this: from Bio import Alphabet from Bio.Align.Generic import Alignment a = Alignment(Alphabet.ProteinAlphabet) which should be: from Bio import Alphabet from Bio.Align.Generic import Alignment a = Alignment(Alphabet.ProteinAlphabet()) The only downside I have thought of is if anyone has created their own alignment class which does NOT subclass the original Bio.Alphabet.Alphabet class. This same test could (should?) also be added to the Seq and MutableSeq objects. What do people think? Peter From bugzilla-daemon at portal.open-bio.org Fri Jul 11 06:39:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 06:39:48 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807111039.m6BAdm05028072@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 06:39 EST ------- In comment 2 I wrote: > I've spotted another problem case, doubly extended alphabets like a > protein alphabet declared Gapped and WithStopCodon (which you *might* > want in an alignment). This alphabet issue is fixed in CVS, as is another corner case of a divide by zero error where an entire column consists of ignored characters. Please re-test with Bio/Align/AlignInfo.py revision 1.16 from CVS. Thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 12:18:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 12:18:28 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807111618.m6BGISQ3013553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #25 from mdehoon at ims.u-tokyo.ac.jp 2008-07-11 12:18 EST ------- (In reply to comment #24) OK, I updated Phd.py. The last module to consider is Ace.py; I'll upload a fixed version soon. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:00:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:00:10 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807112100.m6BL0Aer026629@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #4 from sbassi at gmail.com 2008-07-11 17:00 EST ------- (In reply to comment #1) > First problem, you gave the Alignment object an Alphabet class, rather than an > instance of the class. I guess we should an explicit check to the Alignment > object... Yes, that is my fault. > You mentioned having a similar issue with Bio.AlignIO - could you attached the > example file to this bug with some trivial code showing your problem? Yes, this code with Bio.AlignIO also fails (I tried right now with AlignInfo.py rev. 1.17): from Bio.Align import AlignInfo from Bio.Align.AlignInfo import SummaryInfo from Bio import AlignIO fn = open("secu3.aln") alignment = AlignIO.read(fn, "clustal") summary = SummaryInfo(alignment) print summary.information_content() And I got (and this time I am not supplying any alphabet, at least not explicit): Traceback (most recent call last): File "/mnt/hda2/py252/bin/2542.py", line 12, in print summary.information_content() File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py", line 499, in information_content raise ValueError, errstr ValueError: Error in alphabet: not Nucleotide or Protein, supply expected frequencies > P.S. Please update to Biopython 1.47 rather than using 1.46 I was using Biopython 1.47, but I reported as 1.46 just because 1.47 it is not available from the drop-down menu in bugzilla form. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:02:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:02:24 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807112102.m6BL2OvF026827@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #5 from sbassi at gmail.com 2008-07-11 17:02 EST ------- Created an attachment (id=971) --> (http://bugzilla.open-bio.org/attachment.cgi?id=971&action=view) This file is used by my example were information_content() fails when sequences retrieved with AlignIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:16:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:16:03 -0400 Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and Bio.AlignIO In-Reply-To: Message-ID: <200807112116.m6BLG3SJ027522@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2443 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Specifying the alphabet in |Specifying the alphabet in |Bio.SeqIO.parse() |Bio.SeqIO and Bio.AlignIO ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:16 EST ------- I'm broadening the scope of this enhancement bug to cover Bio.SeqIO and Bio.AlignIO (both their read() and parse() functions). See also alphabet issues raised on Bug 2542. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:19:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:19:50 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807112119.m6BLJoTt027660@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:19 EST ------- > Yes, this code with Bio.AlignIO also fails (I tried right now with > AlignInfo.py rev. 1.17): > > from Bio.Align import AlignInfo > from Bio.Align.AlignInfo import SummaryInfo > from Bio import AlignIO > fn = open("secu3.aln") > alignment = AlignIO.read(fn, "clustal") > summary = SummaryInfo(alignment) > print summary.information_content() > > And I got (and this time I am not supplying any alphabet, at least not > explicit): > > Traceback (most recent call last): > ... > ValueError: Error in alphabet: not Nucleotide or Protein, supply expected > frequencies Good. That seems to be working as intended - alignment formats like FASTA or Clustal do not specify the sequence type (unlike for example the Nexus format). Perhaps Bio.AlignIO.read() and parse() should be able to accept an optional alphabet argument? I had already been considering this for Bio.SeqIO so this is a natural extension. See Bug 2443. Unless information_content() can determine the sequence type (protein or nucleotide) from the alignment alphabet, you have to help it by supplying an appropriate e_freq_table argument. Perhaps: from Bio.Alphabet import IUPAC from Bio.SubsMat import FreqTable from Bio.Align.AlignInfo import SummaryInfo from Bio import AlignIO fn = open("secu3.aln") alignment = AlignIO.read(fn, "clustal") summary = SummaryInfo(alignment) #Have a generic alphabet, without a declared gap char, so must #provide the frequencies and chars to ignore explicitly: expected = FreqTable.FreqTable({"A":0.25,"G":0.25,"T":0.25,"C":0.25}, FreqTable.FREQ, IUPAC.unambiguous_dna) print summary.information_content(e_freq_table=expected, chars_to_ignore=['-']) This is probably safest. I'm doubtful that information_content() will choose wisely if given mixed case or lower case sequences... if that is the case it should be filed as a new bug. > > > P.S. Please update to Biopython 1.47 rather than using 1.46 > > I was using Biopython 1.47, but I reported as 1.46 just because 1.47 > it is not available from the drop-down menu in bugzilla form. Thanks for the reminder - I've added that to Bugzilla now :) I'm marking this bug as fixed now (after the updates to AlignInfo.py) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From peter at maubp.freeserve.co.uk Sat Jul 12 09:45:46 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Sat, 12 Jul 2008 14:45:46 +0100 Subject: [Biopython-dev] Deprecating the HTML parser in Bio.Blast.NCBIWWW Message-ID: <320fb6e00807120645u26321d71q30f72ed5808f700@mail.gmail.com> For some time now we've been discouraging the use of the HTML and plain text Blast parsers in favour of the XML format. I think it would be a good idea to now officially deprecate the HTML parser in Bio.Blast.NCBIWWW with warning messages when it is used. I don't even know if it still works with the recent big revision to the BLAST webpages, but I suspect not. However, the plain text parser in Bio.Blast.NCBIStandalone still has its uses. In particular, right now the PSI-BLAST output in XML format lacks some of the information found in the plain text output (new vs reused entries) so it would be premature to deprecate our plain text PSI parser. See Bug 2502 for details: http://bugzilla.open-bio.org/show_bug.cgi?id=2502#c18 Peter From bugzilla-daemon at portal.open-bio.org Sun Jul 13 12:23:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 13 Jul 2008 12:23:57 -0400 Subject: [Biopython-dev] [Bug 2543] New: Bio.Nexus.Trees can't handle named ancestors Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2543 Summary: Bio.Nexus.Trees can't handle named ancestors Product: Biopython Version: 1.46 Platform: PC OS/Version: FreeBSD Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: markd at soe.ucsc.edu The following code produces: ValueError: invalid literal for float(): Ancestor1 from Bio.Nexus import Trees # from http://evolution.genetics.washington.edu/phylip/newicktree.html treeStr = "(B:6.0,(A:5.0,C:3.0,E:4.0)Ancestor1:5.0,D:11.0);" tree = Trees.Tree(treeStr) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 14 06:17:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 14 Jul 2008 06:17:14 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200807141017.m6EAHEhg019686@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-14 06:17 EST ------- This sounds like a job for Frank (the Bio.Nexus module author). Can I ask if you've actually come across trees with names ancestor nodes in "real life"? That would make this bug more important. If so, the name of the tool would be interesting, an example tree file would be great to add to Biopython as a test case. If on the other hand the only named ancestor tree you've ever tried is the example from the Newick documentation, this doesn't seem such a high priority (but still worth fixing). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 15 16:07:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Jul 2008 16:07:56 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200807152007.m6FK7umn009526@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-15 16:07 EST ------- This is a suggested implementation of the split method for our Seq object, modelled after the python string method which it calls internall. Note that I have made the separator non-optional on the grounds that the string method's default of white space isn't (usually) sensible for sequences. I'm happy to change this if people this its better to be as close as possible to the string method. def split(self, sep, maxsplit=None) : """Split method, like that of a python string. Return a list of the 'words' in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. Unlike the python string method, sep must be specified (as there shouldn't be any whitespace strings in a sequence). e.g. print my_seq.split("-") """ if maxsplit : parts = self.data.split(sep, maxsplit) else : parts = self.data.split(sep) return [Seq(chunk, self.alphabet) for chunk in parts] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 05:39:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 05:39:01 -0400 Subject: [Biopython-dev] [Bug 2544] New: Bio.SeqIO improvements Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2544 Summary: Bio.SeqIO improvements Product: Biopython Version: 1.47 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mmokrejs at ribosome.natur.cuni.cz $ python Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24) [GCC 4.3.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import SeqIO >>> handle = open("genbank-synthetic.gb") >>> print seq_record ID: EF452680.2 Name: EF452680 Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds. /comment=On Feb 4, 2008 this sequence version replaced gi:145391444. /sequence_version=2 /source=synthetic construct /taxonomy=['other sequences', 'artificial sequences'] /keywords=[''] /references=[, , , ] /accessions=['EF452680'] /data_file_division=SYN /date=11-JUN-2008 /organism=synthetic construct /gi=166831528 Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC', IUPACAmbiguousDNA()) >>> I do not see how I could access the value 'DNA' from the LOCUS line: LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008 No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0]. Could seq_record.features have a repr() function to give me something useful instead of this? >>> print seq_record.features [, , ] >>> I don't see documented anywhere in the biopython docs access the features, pasting something like the following into docs would give a user clue where to look for for values: >>> print seq_record.features[0].qualifiers {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq: aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']} >>> print seq_record.features[1].qualifiers {'gene': ['NOS']} >>> print seq_record.features[2].qualifiers {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number': ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma gondii'], 'db_xref': ['GI:166831529'], 'translation': ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'], 'gene': ['NOS'], 'protein_id': ['ABP65329.2']} >>> print seq_record.features[3].qualifiers Traceback (most recent call last): File "", line 1, in IndexError: list index out of range >>> I wonder if I could access the above dicts as seq_record.features['source'] or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 06:30:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 06:30:21 -0400 Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements In-Reply-To: Message-ID: <200807161030.m6GAUL9x017920@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2544 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Bio.SeqIO improvements |Bio.GenBank and SeqFeature | |improvements ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 06:30 EST ------- (In reply to comment #0) > $ python > > Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24) > [GCC 4.3.1] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Bio import SeqIO > >>> handle = open("genbank-synthetic.gb") I'm guessing the missing line here was something like: seq_record = SeqIO.read(handle, "genbank") > >>> print seq_record > ID: EF452680.2 > Name: EF452680 > Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds. > /comment=On Feb 4, 2008 this sequence version replaced gi:145391444. > /sequence_version=2 > /source=synthetic construct > /taxonomy=['other sequences', 'artificial sequences'] > /keywords=[''] > /references=[, > , instance at 0x834ceac>, ] > /accessions=['EF452680'] > /data_file_division=SYN > /date=11-JUN-2008 > /organism=synthetic construct > /gi=166831528 > Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC', > IUPACAmbiguousDNA()) > >>> > > > I do not see how I could access the value 'DNA' from the LOCUS line: > LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008 Currently the sequence type (DNA, RNA, Protein) is used internally by the GenBank parser to determine the alphabet. It is not currently recorded in the SeqRecord object's annotation but could be. How about something like this?: seq_record.annotations["seq_type"] > No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0]. Assuming that the first feature is the source (typically the case), and assuming it has a specified molecule type, then your suggestion is one work around. But I agree, its not nice. > Could seq_record.features have a repr() function to give me something useful > instead of this? > > >>> print seq_record.features > [, instance at 0x837b9cc>, ] Yes we could add that, but you wouldn't want to do that on a typical genome with thousands of features. Adding a repr method for the Reference object is also something I had wondered about doing. > I don't see documented anywhere in the biopython docs access the features, > pasting something like the following into docs would give a user clue where to > look for for values: > > >>> print seq_record.features[0].qualifiers > {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic > construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq: > aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']} > >>> print seq_record.features[1].qualifiers > {'gene': ['NOS']} > >>> print seq_record.features[2].qualifiers > {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number': > ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma > gondii'], 'db_xref': ['GI:166831529'], 'translation': > ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'], > 'gene': ['NOS'], 'protein_id': ['ABP65329.2']} There is a minimal bit of text in what is currently Chapter 10 of the tutorial on the SeqFeature object. I agree, this is an area that needs improvement. Perhaps a full example of parsing a simple GenBank file in the SeqIO chapter would help? > >>> print seq_record.features[3].qualifiers > Traceback (most recent call last): > File "", line 1, in > IndexError: list index out of range You must have only three features (indexed 0, 1 and 2) which explains the index error. > I wonder if I could access the above dicts as seq_record.features['source'] > or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone? As the .type attribute, try this: for feature in seq_record.features: print feature.type You can't access the features by type (e.g. seq_record.features['CDS']) because there is generally more than one feature of each type. Peter P.S. Most of your comments are not on Bio.SeqIO itself, but actually about the underlying Bio.GenBank parser or the SeqFeature object. I have therefore changed the title. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 16 06:49:19 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Jul 2008 11:49:19 +0100 Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py Message-ID: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com> Michiel, I just noticed your CVS revision to Bio/Saf/__init__.py removing this snippet of code: dumpfile = open( 'dump', 'w' ) dumpfile.write( data ) dumpfile.close() I recall seeing (and removing) a similar lump of diagnostics/debugging code from another of Katharine Lindner's parsers. There is still a similar bit of code in Bio/IntelliGenetics/__init__.py which we could remove, but as the whole module is now deprecated we could just wait for a few releases and then remove it entirely. Peter From bugzilla-daemon at portal.open-bio.org Wed Jul 16 07:40:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 07:40:53 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200807161140.m6GBerMH021048@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 07:40 EST ------- (In reply to comment #8) > Whether or not to stop translating at the first stop codon could be an > argument to the translate method. As an alternative, it may be preferable > to have a split() method that splits the sequences at the stop codons. > Such a method could be applied to all protein sequences, not only those > created by translate(). Adding a split() method to the Seq object is a good idea in general (making the Seq object more like a python string), and using my_protein.split("*") is an nice example usage of this. I have posted a possible implementation of the split() method for the Seq object on Bug 2351 comment 15 http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 16 08:40:03 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Jul 2008 13:40:03 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> References: <492634.64872.qm@web62414.mail.re1.yahoo.com> <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> Message-ID: <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> >> But, there is also a set of interconnected modules where it's not 100% >> clear if they can be removed without causing some surprises: >> Bio.builders >> Bio.config >> Bio.dbdefs >> Bio.formatdefs >> Bio.dbdefs >> Bio.expressions >> Bio.FormatIO >> Bio.Std >> Bio.StdHandler >> It is probably OK to remove these, since these were deprecated we did >> not get a barrage of complaints from our users. Personally, I think it is >> important to keep the code base clean, so I am in favor of removing >> these (and see if anybody complains; in that case, we can always put >> these modules back in and make a new release). But I can live with >> keeping these modules for another release round. If anybody thinks >> that that would be better, please let us know. > > Given some of these are very interconnected, I would be inclined to leave > them in for one more release. However I'm content to see them go. If no > one else has any qualms, then please carry on. Now that Biopython 1.47 is out, its probably time to remove Bio.expressions (deprecated in 1.44) and explicitly deprecate the rest: Bio.builders Bio.config Bio.dbdefs Bio.formatdefs Bio.Std Bio.StdHandler (plus Bio.Writer which is part this "Bioformat" code base?) The final entry from your list, Bio.FormatIO, has already been removed. Peter From mjldehoon at yahoo.com Wed Jul 16 10:14:07 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 16 Jul 2008 07:14:07 -0700 (PDT) Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py In-Reply-To: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com> Message-ID: <729090.76301.qm@web62408.mail.re1.yahoo.com> I removed a similar piece of code in one more module (I forgot which one). While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO. --Michiel. --- On Wed, 7/16/08, Peter wrote: > From: Peter > Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py > To: "BioPython-Dev Mailing List" > Date: Wednesday, July 16, 2008, 6:49 AM > Michiel, > > I just noticed your CVS revision to Bio/Saf/__init__.py > removing this > snippet of code: > > dumpfile = open( 'dump', > 'w' ) > dumpfile.write( data ) > dumpfile.close() > > I recall seeing (and removing) a similar lump of > diagnostics/debugging > code from another of Katharine Lindner's parsers. > > There is still a similar bit of code in > Bio/IntelliGenetics/__init__.py which we could remove, but > as the > whole module is now deprecated we could just wait for a few > releases > and then remove it entirely. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Wed Jul 16 10:44:28 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Jul 2008 15:44:28 +0100 Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py In-Reply-To: <729090.76301.qm@web62408.mail.re1.yahoo.com> References: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com> <729090.76301.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com> On Wed, Jul 16, 2008 at 3:14 PM, Michiel de Hoon wrote: > I removed a similar piece of code in one more module (I forgot which one). Bio/MetaTool/__init__.py if anyone wanted to know. The CVS changes RSS feed is handy: http://biopython.org/wiki/Tracking_CVS_commits > While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO. Yes, it probably does - assuming anyone still uses the file format. I'll take a look at that at some point. Peter wrote: >> There is still a similar bit of code in >> Bio/IntelliGenetics/__init__.py which we could remove, but >> as the whole module is now deprecated we could just wait >> for a few releases and then remove it entirely. I've removed the Bio.IntelliGenetics dumpfile code in CVS. Peter From bugzilla-daemon at portal.open-bio.org Wed Jul 16 11:01:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 11:01:41 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807161501.m6GF1fuG028930@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #26 from mdehoon at ims.u-tokyo.ac.jp 2008-07-16 11:01 EST ------- I've uploaded a fixed parser in Bio.Sequencing.Ace to CVS; feel free to have a look and comment. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 16 11:32:03 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Jul 2008 16:32:03 +0100 Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py In-Reply-To: <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com> References: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com> <729090.76301.qm@web62408.mail.re1.yahoo.com> <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com> Message-ID: <320fb6e00807160832w4eef825ek3ed4cfde1cc92cd2@mail.gmail.com> >> While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO. > > Yes, it probably does - assuming anyone still uses the file format. > I'll take a look at that at some point. I've been looking at the PredictProtein site's SAF (Simple Alignment Format) specification, which as far as I know is the only definition (spelling errors and all). Its a free-format somewhat like PHYLIP, and for "nice" input files parsing shouldn't be too difficult. However, some of the corner cases they give are frankly evil, and I wonder if Bio.Saf is actually compliant. See http://www.predictprotein.org/Dexa/optin_safDes.html I'd like to propose deprecating Bio.Saf on the main mailing list. If there are people wanting to use this SAF format, we can then worrying about implementing a non-Martel parser for this file format in Bio.AlignIO instead - and explicitly test it can cope with all the examples given. Peter P.S. I updated Bio.Saf to use the new URL for the PredictProtein site. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 12:08:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 12:08:38 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807161608.m6GG8c0s031867@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 12:08 EST ------- Regarding the Bio.Sequencing.Ace changes (comment 26), some of it looks a bit repetitive. Might a sub-function help here? Also, I was wondering if you managed to fix Bug 2446 as a nice bonus. Regarding the Bio.Sequencing.Phd changes, Michiel has now deprecated Frank & Cymon's original scanner/consumer parser. I didn't think it make sense to leave the original header as it was (with their old version number etc and the suggestion to contacting them directly with bugs). They are of course still listed in the copyright header. New Bio.Sequencing.Phd docstring header text in CVS: """ Parser for PHD files output by PHRED and used by PHRAP and CONSED. This module can be used used directly which will return Record objects which should contain all the original data in the file. Alternatively, using Bio.SeqIO with the "phd" format will call this module internally. This will give SeqRecord objects for each contig sequence. """ Previous text: """ Parser for PHD files output by PHRED and used by PHRAP and CONSED. Works fine with PHRED 0.020425.c Version 1.1, 03/09/2004 written by Cymon J. Cox (cymon.cox at gmail.com ) and Frank Kauff (fkauff 'AT' biologie.uni-kl.de). Comments, bugs, problems, suggestions to one of us are welcome! """ Frank & Cymon - I should have asked first, but is this revised wording OK with you? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 16:35:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 16:35:13 -0400 Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements In-Reply-To: Message-ID: <200807162035.m6GKZDOn012941@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2544 ------- Comment #2 from mmokrejs at ribosome.natur.cuni.cz 2008-07-16 16:35 EST ------- (In reply to comment #1) > > I'm guessing the missing line here was something like: > seq_record = SeqIO.read(handle, "genbank") Yes, I forgot to paste one line, sorry. > > I do not see how I could access the value 'DNA' from the LOCUS line: > > LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008 > > Currently the sequence type (DNA, RNA, Protein) is used internally by the > GenBank parser to determine the alphabet. It is not currently recorded in the > SeqRecord object's annotation but could be. How about something like this?: > > seq_record.annotations["seq_type"] I am not much familiar with the official naming of the fields in LOCUS line by Genbank but hope you are. Yes, it would be fine for me. I hope all other values from LOCUS line can be accessed similarly as well. > > Could seq_record.features have a repr() function to give me something useful > > instead of this? > > > > >>> print seq_record.features > > [, > instance at 0x837b9cc>, ] > > Yes we could add that, but you wouldn't want to do that on a typical genome > with thousands of features. Adding a repr method for the Reference object is > also something I had wondered about doing. I think it could be there even for large records. It not up to the programmer to use repr() or not, and while testing/learning it would be really useful. Or at least internally the routine could check for number of features and in case there would be thousands it could print some first and then stop with a clear message how to force for full listing. > > I don't see documented anywhere in the biopython docs access the features, > > pasting something like the following into docs would give a user clue where to > > look for for values: > > > > >>> print seq_record.features[0].qualifiers > > {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic > > construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq: > > aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']} > > >>> print seq_record.features[1].qualifiers > > {'gene': ['NOS']} > > >>> print seq_record.features[2].qualifiers > > {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number': > > ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma > > gondii'], 'db_xref': ['GI:166831529'], 'translation': > > ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'], > > 'gene': ['NOS'], 'protein_id': ['ABP65329.2']} > > There is a minimal bit of text in what is currently Chapter 10 of the tutorial > on the SeqFeature object. I agree, this is an area that needs improvement. Yes I read that before but it is too short, even after reading 2.4.2, 4.2.1, 9.2 and http://biopython.org/wiki/SeqIO. > > Perhaps a full example of parsing a simple GenBank file in the SeqIO chapter > would help? Definitely, you should pick some exceptional record having different fields, I think the one I have shown is quite OK. > > > >>> print seq_record.features[3].qualifiers > > Traceback (most recent call last): > > File "", line 1, in > > IndexError: list index out of range > > You must have only three features (indexed 0, 1 and 2) which explains the > index error. I knew, it was intentional. ;-) > > > I wonder if I could access the above dicts as seq_record.features['source'] > > or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone? > > As the .type attribute, try this: > > for feature in seq_record.features: > print feature.type >>> for feature in seq_record.features: ... print feature.type ... source gene CDS >>> > > You can't access the features by type (e.g. seq_record.features['CDS']) > because there is generally more than one feature of each type. Yes, but how about seq_record.features['CDS'][index]? Could that be provided? > P.S. Most of your comments are not on Bio.SeqIO itself, but actually about the > underlying Bio.GenBank parser or the SeqFeature object. I have therefore > changed the title. Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Wed Jul 16 21:11:43 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 17 Jul 2008 02:11:43 +0100 Subject: [Biopython-dev] Biopython presentation at BOSC2008 Message-ID: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> Hi all, This year I will be delivering the Biopython presentation at BOSC 2008. The current draft is attached to this email (ppt format - yuck - but the easieast to edit). Comments, suggestions, changes are most welcome. Just one point, the presenation is this Saturday, so if you have any comments, please send them soon. There is one slide still to be completed and a few presentation/looks issues still to be edged out. Many thanks, Tiago -- "Data always beats theories. 'Look at data three times and then come to a conclusion,' versus 'coming to a conclusion and searching for some data.' The former will win every time." ?Matthew Simmons, http://www.tiago.org -------------- next part -------------- A non-text attachment was scrubbed... Name: bosc2008.ppt Type: application/vnd.ms-powerpoint Size: 482816 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Thu Jul 17 09:07:53 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Jul 2008 14:07:53 +0100 Subject: [Biopython-dev] Biopython presentation at BOSC2008 In-Reply-To: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> References: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> Message-ID: <320fb6e00807170607s32af2744j479eb2b2e545f454@mail.gmail.com> > Comments, suggestions, changes are most welcome. Just one point, the > presentation is this Saturday, so if you have any comments, please send > them soon. I've sent Tiago some specific comments directly (little things). One issue which might deserve wider discussion is the project's short term goals. I would suggest putting: * Moving from CVS to Subversion * Make Sequence objects more OO and string-like * More file formats in Bio.SeqIO and Bio.AlignIO And also perhaps the Numeric to numpy move, Bug 2251 http://bugzilla.open-bio.org/show_bug.cgi?id=2251 I subscribe to the numpy mailing list and they seem to have been making big strides in the documentation. Also it looks like they plan to make Travis Oliphant's "Guide to NumPy" book free after "SciPy 2008" - which probably means the August 2008 SciPy conference at Caltech rather than EuroSciPy 2008 in July in Germany. Peter From tiagoantao at gmail.com Thu Jul 17 17:45:56 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 17 Jul 2008 22:45:56 +0100 Subject: [Biopython-dev] Biopython presentation at BOSC2008 In-Reply-To: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> References: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> Message-ID: <6d941f120807171445t32178835n6f5dd77f11f3f004@mail.gmail.com> Hi all, I would like to thank all that sent comments. I used the vast majority of comments sent, so feedback was really useful. Tiago On Thu, Jul 17, 2008 at 2:11 AM, Tiago Ant?o wrote: > Hi all, > > This year I will be delivering the Biopython presentation at BOSC > 2008. The current draft is attached to this email (ppt format - yuck - > but the easieast to edit). > Comments, suggestions, changes are most welcome. Just one point, the > presenation is this Saturday, so if you have any comments, please send > them soon. > > There is one slide still to be completed and a few presentation/looks > issues still to be edged out. > > Many thanks, > Tiago > > -- > "Data always beats theories. 'Look at data three times and then come > to a conclusion,' versus 'coming to a conclusion and searching for > some data.' The former will win every time." > ?Matthew Simmons, > http://www.tiago.org > -- "Data always beats theories. 'Look at data three times and then come to a conclusion,' versus 'coming to a conclusion and searching for some data.' The former will win every time." ?Matthew Simmons, http://www.tiago.org From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:07:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:07:02 -0400 Subject: [Biopython-dev] [Bug 1999] new frame translation method In-Reply-To: Message-ID: <200807190007.m6J0721C023043@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1999 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:09:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:09:26 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200807190009.m6J09Qm2023188@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:30:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:30:36 -0400 Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author names In-Reply-To: Message-ID: <200807190030.m6J0Ua27024398@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2448 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz ------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:30 EST ------- (In reply to comment #2) > {u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName': > u'Jos\xe9'}, If I remember right this is the string-ified representation of utf8 data when you call str() or repr() on them. One could then in upper code try to convert it back but one has to invent the magic code. In my programs I avoid unicode but stick to utf8 and pass it back to the user. But as I say, you may never use print(), str(), repr() because they are not utf8/unicode safe. That should be one of the things to be fixed in python-3. So in summary when I do raise an exception these values will get always printed in the above escaped form, but it is the only exception. I believe as long as you return the values the current code is ok. But, haven't tested. grep-ing related stuff from my programs use e.g.: self._connection = connect(unix_socket=unix_socket, db=dbname, user=username, passwd=password, init_command='SET AUTOCOMMIT=0', charset='utf8', use_unicode=False) if self._connection.character_set_name() != 'utf8': # test whether we really have utf8 connection raise RuntimeError, "Connection to mysql not in utf8 mode: %s" % self._connection.character_set_name() value = unicode(value).encode('utf8') http://evanjones.ca/python-utf8.html http://www.idealliance.org/proceedings/xtech05/papers/02-08-01/ http://www.amk.ca/python/howto/unicode http://diveintopython.org/xml_processing/unicode.html http://www.jorendorff.com/articles/unicode/python.html from elementtree.ElementTree import parse, Element, SubElement, ElementTree # use 'utf8' and not 'utf-8' for Element.write() !!! # We must supply unicode values to ElementTree and not just utf8 encoded strings. _value_node.text = _value.decode('utf8') -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:37:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:37:36 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807190037.m6J0baGc024748@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz ------- Comment #6 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:37 EST ------- I was just about to report this bug. I use biopython to translate EST sequences. They are full of sequencing errors although one knows the CDS region, still it is often interrupted by N's or by literal STOP codons. The current implementation in biopython-1.47 broke it for me. I haven't tested the attached patches but would propose to make this strict check optional. Currently it seems there is no way to pass down the code some variable not to barf in such cases. Will attach my current hack. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 18 20:38:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:38:48 -0400 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200807190038.m6J0cmLK024884@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:38 EST ------- Created an attachment (id=972) --> (http://bugzilla.open-bio.org/attachment.cgi?id=972&action=view) Hack not to break on Ns for unknown bases in ESTs -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 08:47:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 08:47:34 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807191247.m6JClYEO004649@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #955 is|0 |1 obsolete| | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 08:47 EST ------- (From update of attachment 955) I've checked this code in, with the most of the assertions moved into a new unit test. This patch is now obsolete. Checking in Bio/Data/CodonTable.py; /home/repository/biopython/biopython/Bio/Data/CodonTable.py,v <-- CodonTable.py new revision: 1.9; previous revision: 1.8 done RCS file: /home/repository/biopython/biopython/Tests/test_CodonTable.py,v done Checking in Tests/test_CodonTable.py; /home/repository/biopython/biopython/Tests/test_CodonTable.py,v <-- test_CodonTable.py initial revision: 1.1 done RCS file: /home/repository/biopython/biopython/Tests/output/test_CodonTable,v done Checking in Tests/output/test_CodonTable; /home/repository/biopython/biopython/Tests/output/test_CodonTable,v <-- test_CodonTable initial revision: 1.1 done -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 08:52:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 08:52:02 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807191252.m6JCq26c004896@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 08:52 EST ------- (In reply to comment #6) > I was just about to report this bug. I use biopython to translate EST > sequences. They are full of sequencing errors although one knows the CDS > region, still it is often interrupted by N's or by literal STOP codons. The > current implementation in biopython-1.47 broke it for me. I haven't tested the > attached patches but would propose to make this strict check optional. > Currently it seems there is no way to pass down the code some variable not to > barf in such cases. Will attach my current hack. Do you have an example which "worked" in an older version of Biopython, but is "broken" in Biopython 1.47? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 12:40:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 12:40:58 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807191640.m6JGew57014127@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 12:40 EST ------- >gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence AAGAAAACGAGAAGGACGGGGTTATATAGTAAGGTACAAACAGGGCANNNNNNCCATTACACGACCAACT TCTTCGCCTTGCCCTTTTTCTCAGAGTCCTTGTGCGACAGGAACTCGACCTCGGTCGCAAGAGGCCCAGC AAGTCGCGCTCCCTCGGGGTACCCAAGCACACTCATCTTGAAATGCTTCCCAACTCCCTCAATCCTTTCC CGCAGCCCCGCATCCTCCTCGGTCGGTGCAAGTCGCGTCCATATCGACAATCGATAAAACTGCGGCCGCG TCGACACGATCACACCTGTAATCAGCGACGCAGACCCACTTCCACCCGTCAGCGTCGGCGATGGGTCAAA TGTTTCCCCGATCGCAGCCAGCATCGTATACAGCCACATCTTGTCTACGTTGGGTCGGTTTTTATCTTTG GGCAGTTGGATACTCCATTTTCCTCCAAGCTTGTTCGCCTCGTCCTCCCATGCGGGAATAATTCCCTCCT TGAAAAGGTAATAATTTGCCTTCTGGGGCAGTTGAGATGGCGGTATGATGTTGTTATATAACCCCCAAAA CTCCNNNNNGCTATCAAAGNNNNNGACCCGCNNNNNGTCCNCCANNNACCCTTNNNCCNNNNNANNNCCG GNNNNNNNNNNNNTGNGGGTCNNNNNNNNNGCTNNNNNNNNNNTNNNNNG resulted as of Aug 5 2007 in a six-frame translation >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+1 KKTRRTGLYSKVQTG***HYTTNFFALPFFSESLCDRNSTSVARGPASRAPSGYPSTLILKCFPTPSILSRSPASSSVGASRVHIDNR*NCGRVDTITPVISDADPLPPVSVGDGSNVSPIAASIVYSHILSTLGRFLSLGSWILHFPPSLFASSSHAGIIPS LKR**FAFWGS*DGGMMLLYNPQNS**LSK**TR**S***P*****P******V***A***** >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+2 RKREGRGYIVRYKQG***ITRPTSSPCPFSQSPCATGTRPRSQEAQQVALPRGTQAHSS*NASQLPQSFPAAPHPPRSVQVASISTIDKTAAASTRSHL*SATQTHFHPSASAMGQMFPRSQPASYTATSCLRWVGFYLWAVGYSIFLQACSPRPPMRE*FPP *KGNNLPSGAVEMAV*CCYITPKT***YQ***P****P*TL*****R*****G********** >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+3 ENEKDGVI**GTNRA**PLHDQLLRLALFLRVLVRQELDLGRKRPSKSRSLGVPKHTHLEMLPNSLNPFPQPRILLGRCKSRPYRQSIKLRPRRHDHTCNQRRRPTSTRQRRRWVKCFPDRSQHRIQPHLVYVGSVFIFGQLDTPFSSKLVRLVLPCGNNSLLEKVIICLLGQLRWRYDVVI*PPKL**AIK**DP**V***P************G********** >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-1 **********W************P***L**AQ**KLS**LKTPNILL*YGGRVDGVFRLIMEKFLP**GRTLLLRLFEPPFTS*VDGFLFLAGLHLFYTDICYDRR*PLCKLGSGCDCPPSPRRSD*CPH*HSCAGVKIANSYTCAERGWLLLRPDALS*LPQPFVKFYSHEPMGLPRAERPGERWLQLKDSVFLRLFFPFRFFNQHIT**TGQTWNDILGQEEQK >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-2 **********G*****G*****FP*T****P***NY***SKPPIYCCSMAVELTGSSV**WKSSSLNKGVPSCSACSNLLLPHRLTGFYFWLGCICSTPTYATTDASPFVNWVAAATAHLHPDAATNVHTSTAAPASK*LTAIPALNVAGSSYAPTPFPNSLNPS*SSTHTNPWGSLALNDPENAGSSSRTACS*DSFSRSASSTSTL***RDKHGMIYWGRKSKR >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-3 *****S***L******A*****S***P**RP**ETI**PQNPQYIVVVWR*S*RGLPFNNGKVPPLIRAYPPAPLVRTSFYLIG*RVSIFGWVASVLHRHMLRPTLAPL*TG*RLRLPTFTQTQRLMSTLAQLRRRQNS*QLYLR*TWLAPPTPRRPFLTPSTLRKVLLTRTHGAPSR*TTRRTLAPAQGQRVPETLFPVPLLQPAHY***GTNME*YIGAGRAKE -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 12:44:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 12:44:36 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807191644.m6JGiahx014350@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 12:44 EST ------- BTW, formatdb silently ignores asterisks so you have to replace them with X yourself otherwise alignment outputs from blast do not reflect reality. Don't know if I would prefer biopython to give me 'X' instead of '*', maybe for codons with 'N', 'R' would prefer X while for true STOP codons would prefer '*'. In PIR database is nice that proteins really ending at a STOP codon have a trailing '*'. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 16:24:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 16:24:41 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807192024.m6JKOffD023599@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 16:24 EST ------- How did you do the six translations in comment 9? Using Bio.Seq.translate() would have failed with a TranslationError on any "NNN" codon or similar. By common agreement "*" is used for a stop symbol. While "X" generally means any amino acid, I have somethimes seen it used to mean any amino acid OR a stop codon (in the NCBI translations in certain GenBank files). Personally I think it would be nice if there was an agreed character for an amino acid OR stop codon (e.g. "!" for example). However, as far as I know no such convention exists, so we shouldn't invent one as the default in Biopython. P.S. The nicest way to handle translate("NNN") isn't what I filed this bug about. Its the fact that translate("{@}") or anything else like that returns "*" and not an error. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 17:40:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 17:40:35 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807192140.m6JLeZQR025907@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #12 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 17:40 EST ------- Created an attachment (id=973) --> (http://bugzilla.open-bio.org/attachment.cgi?id=973&action=view) translate_ESTs.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 10:46:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 10:46:23 -0400 Subject: [Biopython-dev] [Bug 2547] New: Translation of ambiguous codons like NNN and TAN Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2547 Summary: Translation of ambiguous codons like NNN and TAN Product: Biopython Version: 1.47 Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk It is often useful to want to translate ambiguous nucleotide sequences (e.g. EST sequences), and this may contain codons which could code for an amino acid OR a stop codon (e.g. NNN, TNN or TAN). See for example Bug 2530 comment 6 and comment 9. Currently Bio.Seq.translate() will not translate such sequences and raises an exception. The following example shows correct translation of ambiguous codons which only encode valid amino acid(s) OR valid stop codons (but not both): from Bio.Seq import translate assert translate("TAA") == "*" assert translate("TAG") == "*" assert translate("TAT") == "Y" assert translate("TAC") == "Y" #Recall ambiguous nucleotide Y means T or C (pYrimidine) #so TAY = TAT or TAC which both code for Y (Tyr, Tyrosine) assert translate("TAY") == "Y" #Recall ambigous nucleoide R means G or A (puRine) #so TAR = TAG or TAA which both code for a stop codon assert translate("TAR") == "*" However, in Biopython 1.47 the following all raise an exception: translate("TAN") translate("TAM") translate("TAK") translate("TRR") translate("TNN") translate("NNN") TAN, TAM, TAK, ... can code for Y or stop. More generally, "TRR" and "TNN" can code multiple amino acids or a stop codon, and "NNN" can code for any amino acid or a stop codon. According to IUPAC, the single letter protein code X is an "unknown or 'other' amino acid" (igoring its historic and obsolete usage for selenocysteine, now U). http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html This document does NOT cover the idea of stop codons, and I am not aware of any additional symbol to mean "any amino acid OR a stop codon" which would be ideal for this situation. For comparison, the EMBOSS transeq tool will use X when given a codon which could be either an amino acid OR a stop codon: $ transeq -filter asis:NNNTANTARTAGTAYTAC XX**YY Therefore one solution would be to follow EMBOSS and return X for codons which could be an amino acid OR a stop codon. See also Bug 2530 on the related issue that Bio.Seq.translate() currently translates invalid codons as "*" (presumably an accidental side effect of the implementation). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 10:50:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 10:50:22 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807201450.m6KEoMVZ017607@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 10:50 EST ------- Martin, I've filed Bug 2547 ("Translation of ambiguous codons like NNN and TAN") on the separate issue of wanting to translate ambigous codons as found in EST sequences. This bug (Bug 2530) is only for the mis-translation of invalid codons as stop characters. If there is agreement that changing the behaviour of Bio.Seq.translate() as described in Bug 2547 is desirable, then we end up fixing both issues at the same time. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sun Jul 20 11:03:48 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 20 Jul 2008 16:03:48 +0100 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: <200807192140.m6JLeZQR025907@portal.open-bio.org> References: <200807192140.m6JLeZQR025907@portal.open-bio.org> Message-ID: <320fb6e00807200803v57820ab8v2502d6e5671933cc@mail.gmail.com> > ------- Comment #12 from mmokrejs ------- > Created an attachment (id=973) > --> (http://bugzilla.open-bio.org/attachment.cgi?id=973&action=view) > translate_ESTs.py Martin, I had some general comments on your code which you might find helpful. Most of your variable name start with an underscore - this is very unusual. There is a convention in Python that a single leading underscore is used for private properties or methods of an object. You used the following code to reverse a string by turning it into a list and back again: _reversed = list(_record.sequence) _reversed.reverse() _reversed = ''.join(_reversed) For simply reversing a string, I would suggest using a stride of minus one instead, reversed_string = old_string[::-1] You then go on to take the reverse complement (without worrying about ambiguous characters which could be present, e.g. R -> Y): _reversed = list(_record.sequence) _reversed.reverse() _reversed = ''.join(_reversed) _reversed = _reversed.translate(string.maketrans('AaTtGgCcUu', 'TtAaCcGgAa'), '') I would suggest using the Bio.Seq.reverse_complement() function here instead. Finally are you aware of the string formatting operator (%) in python? The following code: _outprothandle.write(''.join(('>', _record.gi, ' ', _record.definition, ' frame:-3', '\n', translate(_reversed[2:]).replace('*','X'), '\n'))) might typically be written as: _outprothandle.write('>%s %s frame:-3\n%s\n" % (_record.gi, _record.definition, translate(_reversed[2:]).replace('*','X'))) See http://docs.python.org/lib/typesseq-strings.html for more details (and how to use named insertion points). Peter From bugzilla-daemon at portal.open-bio.org Sun Jul 20 12:08:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 12:08:22 -0400 Subject: [Biopython-dev] [Bug 2548] New: Updating IUPACData and ExtendedIUPACProtein for U and O Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2548 Summary: Updating IUPACData and ExtendedIUPACProtein for U and O Product: Biopython Version: 1.47 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The IUPAC data in Biopython has not been updated to officially use X for any amino acid and U for selenocysteine (Sec). Nor do we support O for pyrrolysine (Pyl) . I haven't found an official statement from the IUPAC-IUBMB Joint Commission on Biochemical Nomenclature via Google, but several major resources confirm this: http://www.ebi.ac.uk/RESID/faq.html http://www.uniprot.org/news/2008/02/26/release http://doc.bioperl.org/bioperl-live/Bio/Tools/IUPAC.html Patch to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 12:26:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 12:26:10 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807201626.m6KGQAQZ021741@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 12:26 EST ------- See also: http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html Taking the following as the current IUPAC standard, there is no direct mention of the use of J in NMR as designation for signals assigned either to leucine (L) or to isoleucine (I) which cannot be distinguished from each other. I am therefore NOT intending to add J to Biopython's IUPAC extend protein alphabet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 12:54:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 12:54:51 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807201654.m6KGsp7L022759@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 12:54 EST ------- Created an attachment (id=974) --> (http://bugzilla.open-bio.org/attachment.cgi?id=974&action=view) Adds U and O, clearly defines X, but does not add J Does anyone have any definative sources on the MW of these "new" amino acids? Also I'd like to confirm if IUPAC have officially accepted "J" or not. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 14:30:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 14:30:17 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807201830.m6KIUHMb028714@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz ------- Comment #1 from mmokrejs at ribosome.natur.cuni.cz 2008-07-20 14:30 EST ------- Regarding the selenocystein issue, expect "inconsistencies" between data files released from NCBI. I haven't check now but in 2002 I had the following communication with NCBI staff: GenBank format requires official IUPAC amino acid code that doesn't include Selenocystein and therefore it uses 'X'. FASTA format uses the NCBI extended amino acid code that does include Selenecystein 'U'. > >gi_2983532 formate dehydrogenase alpha subunit [Aquifex aeolicus] > MNYMDISRRGFLKLSVGSVGAGILGGLGFDLTPAYARVRDLKITKAKVTKSICPYCSVSCGILAYSLSDG > AMNVKERIIHVEGNPDDPINRGTLCPKGATLRDFVNAPDRLTKPLYRPAGSTEWKEISWDEAIEKFARWV > KDTRDRTFIHKDKAGRVVNRCDSIVWAVGSPLGNEEGWLMVKIGIALGLSARETQATIUHAPTVASLAPT > ------------------------^ [cut] > > It seems there's buggy version in > ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.faa > although the .gbk flatfile says "X" in case of "U". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 17:16:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 17:16:48 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807202116.m6KLGmdb005982@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #974 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 17:16 EST ------- Created an attachment (id=975) --> (http://bugzilla.open-bio.org/attachment.cgi?id=975&action=view) Tested version of previous patch This revision includes a work arround for missing molecular weights in _make_ambiguous_ranges() function, and has been tested with the full test suite on Linux. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 06:55:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 06:55:13 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807211055.m6LAtDHp009314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 06:55 EST ------- (In reply to comment #1) > Regarding the selenocystein issue, expect "inconsistencies" between data files > released from NCBI. I haven't check now but in 2002 I had the following > communication with NCBI staff ... I think you meant to post this on Bug 2548. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 07:04:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 07:04:14 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807211104.m6LB4E0w009769@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 ------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz 2008-07-21 07:04 EST ------- Yes, sorry. :( -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 07:10:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 07:10:02 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807211110.m6LBA2H8010005@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:10 EST ------- I've gone over the GenBank release notes on this issue... Quoting ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb131.release.notes (Dated August 15 2002, similar text appears in earlier files too as a warning of intended changes) ============================================================== 1.3.3 Selenocysteine representation At the May 1999 DDBJ/EMBL/GenBank collaborative meeting, it was learned that IUPAC plans to adopt the letter 'U' for selenocysteine. With this August 2002 release, selenocysteine residues are now presented via residue abbreviation 'U', in both /translation and /transl_except qualifiers. ============================================================== By now they SHOULD have fixed any sequences which were using X for selenocysteine to use U instead. Quoting ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb156.release.notes (Dated October 15 2006, similar text appears in earlier files too as a warning of intended changes) ============================================================== 1.3.4 New protein residue abbreviation for Pyrrolysine Sequence databases use single-letter amino acid abbreviations to record the primary structure (sequence) of amino acids in a polypeptide. The table of abbreviations includes only those amino acids that are encoded in the genetic code and directly inserted by a tRNA during the process of protein translation. Post-translational modifications are not represented in the sequence data itself, but may be described by features annotated on the sequence. The discovery of the 22nd naturally encoded amino acid, pyrrolysine, and the recent submission of sequence records that should contain this residue, require the adoption of a new amino acid abbreviation. Because several letters are assigned to represent different experimental ambiguities, the only letter still available for use is O (uppercase letter o). Scientists working in the field have independently suggested use of this letter, and it has a reasonable mnemonic, pyrrOlysine. The IUPAC-IUBMB Joint Commission on Biochemical Nomenclature has agreed that Pyl/O will be recommended for this amino acid. The consequences for flatfile users are that O can now appear in CDS /translation qualifiers, and that Pyl (the three-letter abbreviation) can appear in CDS /transl_except qualifiers and in the /product and /anticodon qualifiers of tRNA features. These changes are legal as of this October 2006 GenBank Release. Sample ASN.1, FASTA, GenBank flatfile, and INSDSeq XML files for CP000099, which has a protein with a pyrrolysine residue, are available for testing purposes at the NCBI FTP site: ftp://ftp.ncbi.nih.gov/genbank/Pyrrolysine_Samples Files: CP000099.pse (print-form ASN.1 Seq-entry) CP000099.gbff (GenBank flatfile) CP000099.aa_fsa (protein FASTA) CP000099.isx (INSDSeq XML) ============================================================== And later on in the same file, ============================================================== 1.3.5 Protein residue J for leucine/isoleucine ambiguities The residue abbreviation J is reserved for mass spectrometry experiments that cannot distinguish leucine from isoleucine. Although this abbreviation has been part of the IUPAC recommendations for some time, it has not previously appeared in protein sequences in the GenBank database. As of October 2006, abbreviation J is legal in CDS /translation qualifiers, and Xle (the three-letter abbreviation) will be allowed in CDS /transl_except qualifiers and in the /product and /anticodon qualifiers of tRNA features. J will also be mapped to unknown (X) for the purpose of BLAST and other sequence similarity search tools. ============================================================== So, according to GenBank, "The residue abbreviation J is reserved for mass spectrometry experiments that cannot distinguish leucine from isoleucine ... this abbreviation has been part of the IUPAC recommendations for some time". I would prefer a direct citation, but that seems good enough evidence to me to include J in the Biopython IUPAC extended protein alphabet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 07:18:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 07:18:12 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807211118.m6LBICM8010531@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:18 EST ------- Regarding Martin's example (erroneously added to Bugzilla as Bug 2547 comment 1), the protein GI:2983532 Martin wrote "GenBank format requires official IUPAC amino acid code that doesn't include Selenocystein and therefore it uses 'X'." That is out of date - IUPAC and GenBank both accept U for selenocysteine now (see my notes in comment 4 of this bug). Looking at these files: ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.faa ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.gbk (feature translation) They both give the same amino acid sequence for GI:2983532, which includes "U" but not "X" as I had expected. >gi|2983532|gb|AAC07107.1| formate dehydrogenase alpha subunit [Aquifex aeolicus VF5] MNYMDISRRGFLKLSVGSVGAGILGGLGFDLTPAYARVRDLKITKAKVTKSICPYCSVSCGILAYSLSDG AMNVKERIIHVEGNPDDPINRGTLCPKGATLRDFVNAPDRLTKPLYRPAGSTEWKEISWDEAIEKFARWV KDTRDRTFIHKDKAGRVVNRCDSIVWAVGSPLGNEEGWLMVKIGIALGLSARETQATIUHAPTVASLAPT FGRGAMTNNWVDISNSDLVFVMGGNPAENHPCGFKWAIKAREKRGAKIICIDPRFNRTAAVADIFVQIRP GTDIAFLGGLINYVLQNEKYQKEYVRLHTTGPFIVREDFGFKDGLFTGYDPKTRSYDTTTWDYEFDPATG YPKMDPEMKHPRCVLNILKEHYSRYTPEVVSQICGCSKEDFLRVAEEVAKCGAPNKFMTILYALGWTHHS YGTQLIRTACMLQLLLGNIGCPGGGINALRGHSNVQGMTDLAGQNKNLPTYIKPPKPEEQTLAQHLKNRT PRKLHPTSLNYWANYPKFFISFLKCMWGDAATPENDFAYDYLYKPEGGYNSWDKFIDDMYKGKIEGVVTA ALNFLNNTPNAKKTVRALKNLKWMVVMDPFMIETAQFWKAEGLDPKEVKTEILVLPTAVFLEKEGSFTNS ARWVKWKYKATDPPGDAKDEFWIFGRFFMKLKEFYEKEGGAFPEPILNLVWPYKNPYYPTAEEILTEING YYTRDVDGHKKGERVRLFTDLRDDGSTACGGWLYCGVFPPEGNLAKRTDLSDPLGLGTYPNYAWNWPANR RVLYNRASCDEKGRPWDPERPLLRWDPERDMWVGDIPDYPATAPPEKGIGAFIMLPEGKGRLFAAKSYVT FKDGPLPEHYEPYESPVTNILHPNVPHNPVAKVYKSDLDLLGTPDKFPHVATTYRLTEHYHFWTKHLYGP SLLAPVMFIEIPEELAKEKGIQNGDLVRVSTARASIEAIALVTKRIKPLKVAGKTVYTIGIPIHWGFEGL VKGAITNFITPNVWDPNSRTPEFKGFLANIEKVKT It is quite possible that during the transition from X to U for selenocysteine there were inconsistencies in GenBank - but I hope/expect the NCBI have fixed them all by now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 07:49:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 07:49:21 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807211149.m6LBnLli012323@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #975 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:49 EST ------- Created an attachment (id=976) --> (http://bugzilla.open-bio.org/attachment.cgi?id=976&action=view) Adds J, U and O, and clearly defines X as an unknown amino acid Based on the GenBank release notes indirect confirmation that J is now an IUPAC recommendation, I have updated my patch to include J as well. Note that this requires a trivial update to test_seq.py (included in this patch). Still ideally needs the MW filled in for U and O. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:25:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 11:25:59 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807211525.m6LFPxgs022821@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:25 EST ------- I've managed to cobble together my first ever Perl program from scratch, and established that BioPerl does the same as EMBOSS - they use an "X" when the codon could be either an amino acid OR a stop codon. My quick BioPerl script, ================================================ use Bio::Seq $nuc_str = 'NNNTANTARTAGTAYTAC'; print "BioPerl translation of:\n"; $seq_obj = Bio::Seq->new(-seq => $nuc_str); print $seq_obj->seq(); print "\n\n"; print "Sequence object's translation method:\n"; print $seq_obj->translate()->seq(); print "\n\n"; use Bio::Perl; print "translate_as_string:\n"; print translate_as_string($nuc_str); print "\n"; ================================================ And the output: ================================================ BioPerl translation of: NNNTANTARTAGTAYTAC Sequence object's translation method: XX**YY translate_as_string: XX**YY ================================================ There does seem to be a consensus building here! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:38:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 11:38:03 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200807211538.m6LFc327023466@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:38 EST ------- For comparison, the following is copied from the BioPerl documentation about their sequence object's translate method. It would be nice to follow some of the same naming conventions for any optional arguments. http://www.bioperl.org/Core/Latest/bptutorial.html#iii_3_1_manipulating_sequence_data_with_seq_methods If we want to translate full coding regions (CDS) the way major nucleotide databanks EMBL, GenBank and DDBJ do it, the translate() method has to perform more checks. Specifically, translate() needs to confirm that the sequence has appropriate start and terminator codons at the very beginning and the very end of the sequence and that there are no terminator codons present within the sequence in frame 0. In addition, if the genetic code being used has an atypical (non-ATG) start codon, the translate() method needs to convert the initial amino acid to methionine. These checks and conversions are triggered by setting ``complete'' to 1: $prot_obj = $my_seq_object->translate(-complete => 1); -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:41:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 11:41:47 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807211541.m6LFflk5023670@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:41 EST ------- For reference, using the older Bio.Translate approach suffers the same limitation (which is not surprising if you consider they both use the same CodonTable objects internally): >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> from Bio import Translate >>> standard_translator = Translate.ambiguous_dna_by_id[1] The clear cut cases are fine, >>> standard_translator.translate(Seq("TAR", IUPAC.ambiguous_dna)) Seq('*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> standard_translator.translate(Seq("TAY", IUPAC.ambiguous_dna)) Seq('Y', HasStopCodon(ExtendedIUPACProtein(), '*')) When the codon could be an amino acid or a stop, we raise an exception: >>> standard_translator.translate(Seq("NNN", IUPAC.ambiguous_dna)) Traceback (most recent call last): ... Bio.Data.CodonTable.TranslationError: NNN >>> standard_translator.translate(Seq("TAN", IUPAC.ambiguous_dna)) Traceback (most recent call last): ... Bio.Data.CodonTable.TranslationError: TAN -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 22 07:32:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 22 Jul 2008 07:32:10 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807221132.m6MBWAAF016950@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp 2008-07-22 07:32 EST ------- (In reply to comment #27) > Regarding the Bio.Sequencing.Ace changes (comment 26), some of it looks a bit > repetitive. Might a sub-function help here? I thought about that, but each time the repetitive code is slightly different, and I wonder if the end result will be clearer than what we have now. > Also, I was wondering if you managed to fix Bug 2446 as a nice bonus. I am planning to do so. I am checking with the polyphred people if the COMMENT blocks are really intended and are here to stay (note that the polyphred version that writes these COMMENT blocks is a beta version). Will update the code once I hear back from them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Tue Jul 22 07:38:13 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 22 Jul 2008 04:38:13 -0700 (PDT) Subject: [Biopython-dev] Bio.KDTree Message-ID: <108429.69921.qm@web62404.mail.re1.yahoo.com> Hi everybody, Bio.KDTree is currently the only piece of C++ code in Biopython. C++ doesn't compile cleanly on all platforms (for example it is missing in the Biopython installer for Python 2.3 on Windows); some platforms don't even have a C++ compiler. For this reason, setup.py asks the user each time if Bio.KDTree should be compiled. Does anybody (Thomas?) mind if I convert this code to plain C? That would be a nice weekend project. Then we can get rid of the question in setup.py, and Bio.KDTree can be made available on all platforms. --Michiel. From biopython at maubp.freeserve.co.uk Tue Jul 22 12:13:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Jul 2008 17:13:34 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> References: <492634.64872.qm@web62414.mail.re1.yahoo.com> <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> Message-ID: <320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com> On June 27, Michiel wrote: > ..., there is also a set of interconnected modules where it's not 100% > clear if they can be removed without causing some surprises: > Bio.builders > Bio.config > Bio.dbdefs > Bio.formatdefs > Bio.dbdefs > Bio.expressions > Bio.FormatIO [already deprecated and removed] > Bio.Std > Bio.StdHandler > It is probably OK to remove these, since these were deprecated we did > not get a barrage of complaints from our users. Personally, I think it is > important to keep the code base clean, so I am in favor of removing > these (and see if anybody complains; in that case, we can always put > these modules back in and make a new release). But I can live with > keeping these modules for another release round. If anybody thinks > that that would be better, please let us know. Bio.expressions was already deprecated, and seems to be a dependency of the following modules, which I have now explicitly deprecated in CVS: Bio.expressions (deprecated in Biopython 1.44) Bio.config Bio.dbdefs Bio.formatdefs Bio.dbdefs It probably would be fine to remove these five modules now (Bio.expressions, Bio.config, Bio.dbdefs, Bio.formatdefs and Bio.dbdefs), since the indirect warning from Bio.expressions should have alerted anyone who was using them. Or we can ship one more release with them included? Moving on, Bio.Std and Bio.StdHandler appear to be used by: - Bio.expressions (deprecated in Biopython 1.44) - Bio.config (now deprecated in CVS) - Bio.builders (used by Mindy) - Bio.Mindy (used by Bio.config which is now deprecated) As far as I can tell, other historic usage of Mindy (e.g. in Bio.Fasta and Bio.GenBank) has already been deprecated and removed. I think it would therefore also be safe to deprecate these four together (Bio.expressions, Bio.config, Bio.builders and Bio.Mindy), or start by deprecating Bio.Mindy on its own. Peter From bugzilla-daemon at portal.open-bio.org Tue Jul 22 12:29:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 22 Jul 2008 12:29:27 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807221629.m6MGTRuo002799@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-22 12:29 EST ------- Frank, Would you mind if I removed this print statement from the add_sequence() method?: print "WARNING: Sequence name %s is already present. Sequence was added as %s." % (name,unique_name) I'd like to be able to call this method in code from Bio.SeqIO / Bio.AlignIO to write alignments, without getting warnings printed out. Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Jul 22 12:33:53 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Jul 2008 17:33:53 +0100 Subject: [Biopython-dev] Bio.KDTree In-Reply-To: <108429.69921.qm@web62404.mail.re1.yahoo.com> References: <108429.69921.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00807220933v1e6125a7lcb91b963a5dd5195@mail.gmail.com> On Tue, Jul 22, 2008 at 12:38 PM, Michiel de Hoon wrote: > Hi everybody, > > Bio.KDTree is currently the only piece of C++ code in Biopython. C++ doesn't > compile cleanly on all platforms (for example it is missing in the Biopython > installer for Python 2.3 on Windows); some platforms don't even have a C++ > compiler. For this reason, setup.py asks the user each time if Bio.KDTree > should be compiled. Does anybody (Thomas?) mind if I convert this code to > plain C? That would be a nice weekend project. Then we can get rid of the > question in setup.py, and Bio.KDTree can be made available on all platforms. If you want to spend your weekend doing this, it does sounds like a worthwhile incremental improvement to Biopython - and should simplify the build process which is great. Peter P.S. Have you noticed Bug 2489 "KDTree NN search without specifying radius"? http://bugzilla.open-bio.org/show_bug.cgi?id=2489 From bugzilla-daemon at portal.open-bio.org Tue Jul 22 19:50:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 22 Jul 2008 19:50:31 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200807222350.m6MNoVXd024298@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #15 from mmokrejs at ribosome.natur.cuni.cz 2008-07-22 19:50 EST ------- (In reply to comment #5) > Another bonus for people who think OO, is doing dir(my_seq) would > list these useful methods. Right now the user has to know to go > looking in the Bio.Seq module for a function. I do this quite often and this is a weak point in current biopython. Good catch! Regarding the back_translate, I don't use it but people ask for it often so don't remove it. Otherwise I won't know where else to get this functionality. ;-) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 22 20:05:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 22 Jul 2008 20:05:09 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807230005.m6N059QE025415@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #12 from fkauff at biologie.uni-kl.de 2008-07-22 20:05 EST ------- Peter, No problem. Cheers, Frank (In reply to comment #11) > Frank, > > Would you mind if I removed this print statement from the add_sequence() > method?: > > print "WARNING: Sequence name %s is already present. Sequence was added as %s." > % (name,unique_name) > > I'd like to be able to call this method in code from Bio.SeqIO / Bio.AlignIO to > write alignments, without getting warnings printed out. > > Thanks > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 23 07:49:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 23 Jul 2008 07:49:33 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807231149.m6NBnX4P014410@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-23 07:49 EST ------- (In reply to comment #12) > Peter, > > No problem. > > Cheers, > Frank Great. I've removed that print statement (and tweaked a few doc strings) in CVS. Checking in Nexus.py; /home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py new revision: 1.19; previous revision: 1.18 done I'm just working on some alphabet stuff before adding support to write "nexus" format files with Bio.SeqIO and Bio.AlignIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 23 08:33:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 23 Jul 2008 08:33:10 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807231233.m6NCXAk6018007@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-23 08:33 EST ------- Fixed in CVS - you can now write Nexus files using Bio.SeqIO or Bio.AlignIO, provided the alphabet is declared as DNA, RNA or protein. You cannot use generic alphabets or just nucleotide alphabets. Multiple files have been changed, so a complete CVS update is the best way to test this before the next release of Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 23 10:12:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 23 Jul 2008 10:12:38 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200807231412.m6NECc33027073@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 fkauff at biologie.uni-kl.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-23 10:12 EST ------- I recently got some code that is supposed to be able to deal with labeled nodes (probably from the author of this bug - can't check now, as I'm traveling and don't have access to the files). haven't looked at or tested the code yet, but will do soon when I'm back. Frank (In reply to comment #1) > This sounds like a job for Frank (the Bio.Nexus module author). > > Can I ask if you've actually come across trees with names ancestor nodes in > "real life"? That would make this bug more important. If so, the name of the > tool would be interesting, an example tree file would be great to add to > Biopython as a test case. > > If on the other hand the only named ancestor tree you've ever tried is the > example from the Newick documentation, this doesn't seem such a high priority > (but still worth fixing). > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Jul 24 07:41:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 24 Jul 2008 12:41:41 +0100 Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules Message-ID: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com> Hi all, We (Michiel) deprecated the Bio.WWW.* modules in Biopython 1.45, after relocating most of the functionality: Bio.WWW.ExPASy -> Bio.ExPASy Bio.WWW.InterPro -> Bio.InterPro Bio.WWW.NCBI -> Bio.Entrez Bio.WWW.SCOP -> Bio.SCOP Now that the deprecation warnings have been in place for a couple of releases, I'd like to remove the four Bio.WWW.* modules, and leave just Bio/WWW/__init__.py with a deprecation warning telling people where to look for the relocated code. Any comments or objections? Peter From mjldehoon at yahoo.com Thu Jul 24 20:19:33 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 24 Jul 2008 17:19:33 -0700 (PDT) Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules In-Reply-To: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com> Message-ID: <502434.4415.qm@web62406.mail.re1.yahoo.com> Note that Bio.WWW.__init__.py contains some code that is used in other modules. Most (but not all) of these modules are deprecated themselves. For the non-deprecated modules, it's probably easiest to just copy the code from Bio.WWW.__init__.py over to avoid having to import Bio.WWW. --Michiel. --- On Thu, 7/24/08, Peter wrote: > From: Peter > Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules > To: "BioPython-Dev Mailing List" > Date: Thursday, July 24, 2008, 7:41 AM > Hi all, > > We (Michiel) deprecated the Bio.WWW.* modules in Biopython > 1.45, after > relocating most of the functionality: > > Bio.WWW.ExPASy -> Bio.ExPASy > Bio.WWW.InterPro -> Bio.InterPro > Bio.WWW.NCBI -> Bio.Entrez > Bio.WWW.SCOP -> Bio.SCOP > > Now that the deprecation warnings have been in place for a > couple of > releases, I'd like to remove the four Bio.WWW.* > modules, and leave > just Bio/WWW/__init__.py with a deprecation warning telling > people > where to look for the relocated code. > > Any comments or objections? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Fri Jul 25 06:31:49 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 25 Jul 2008 11:31:49 +0100 Subject: [Biopython-dev] Updating the installation instructions Message-ID: <320fb6e00807250331k47ec64dcoe246933f0d02682b@mail.gmail.com> As Nick Matzke has pointed out, http://biopython.org/DIST/docs/install/Installation.html and http://biopython.org/DIST/docs/install/Installation.pdf are somewhat out of date. I've updated the source LaTeX file in CVS to cover python 2.5 being the latest stable python, mxTextTools is now optional (but 2.0 is preferred over 3.0), and removed the bits about the "Classic" Mac (pre OS X). http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/install/Installation.tex?cvsroot=biopython The reportlab instructions probably need updating too - although we should double check if everything is happy with ReportLab 2 as part of this. If anyone wants to skim over the revised version and look for anything I've missed or other improvements that would be great. Peter From biopython at maubp.freeserve.co.uk Fri Jul 25 07:21:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 25 Jul 2008 12:21:31 +0100 Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules In-Reply-To: <502434.4415.qm@web62406.mail.re1.yahoo.com> References: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com> <502434.4415.qm@web62406.mail.re1.yahoo.com> Message-ID: <320fb6e00807250421w15b1d8a9qe9d5d178c233ec7b@mail.gmail.com> On Fri, Jul 25, 2008 at 1:19 AM, Michiel de Hoon wrote: > Note that Bio.WWW.__init__.py contains some code that is used in other modules. > Most (but not all) of these modules are deprecated themselves. For the > non-deprecated modules, it's probably easiest to just copy the code from > Bio.WWW.__init__.py over to avoid having to import Bio.WWW. Good catch - I didn't do my recursive grep correctly. The file Bio/WWW/__init__.py just contains a RequestLimiter class, and this is currently used in: Bio/Blast/NCBIWWW.py (used in qblast, simple to recode as in Bio.Entrez) Bio/config/_support.py (completely deprecated) Bio/Prosite/__init__.py (in the deprecated ExPASyDictionary class) Bio/SwissProt/SProt.py (in the deprecated ExPASyDictionary class) Note I have just updated Bio.Prosite and Bio.SwissProt to use Bio.ExPASy rather than Bio.WWW.ExPASy which means we can delete the deprecated Bio/WWW/ExPASy.py, InterPro.py, NCBI.py and SCOP.py now. Peter From bugzilla-daemon at portal.open-bio.org Sat Jul 26 18:05:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 26 Jul 2008 18:05:24 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807262205.m6QM5Ow9021435@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-26 18:05 EST ------- Checking in Bio/Alphabet/IUPAC.py; /home/repository/biopython/biopython/Bio/Alphabet/IUPAC.py,v <-- IUPAC.py new revision: 1.3; previous revision: 1.2 done Checking in Bio/Data/IUPACData.py; /home/repository/biopython/biopython/Bio/Data/IUPACData.py,v <-- IUPACData.py new revision: 1.5; previous revision: 1.4 done Checking in Tests/test_seq.py; /home/repository/biopython/biopython/Tests/test_seq.py,v <-- test_seq.py new revision: 1.15; previous revision: 1.14 done Marking as fixed, although still ideally needs the MW filled in for U and O. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 11:30:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 11:30:37 -0400 Subject: [Biopython-dev] [Bug 2550] New: Alphabet problems when adding sequences Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2550 Summary: Alphabet problems when adding sequences Product: Biopython Version: 1.47 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk #Create three sequences as Seq objects, >>> from Bio import Alphabet >>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> a = Seq("ACTG", Alphabet.generic_dna) >>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-")) >>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-")) >>> a Seq('ACTG', DNAAlphabet()) >>> b Seq('AC-TG', Gapped(DNAAlphabet(), '-')) >>> c Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-')) #Now try adding them together... >>> b+c Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-')) >>> a+b Traceback (most recent call last): File "", line 1, in ? File "/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line 77, in __add__ elif other.alphabet.contains(self.alphabet): File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line 95, in contains return other.gap_char == self.gap_char and \ AttributeError: DNAAlphabet instance has no attribute 'gap_char' I would expect to get: Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-')) Similar example, but using proteins >>> p = Seq("ACDEFG", Alphabet.generic_protein) >>> q = Seq("ACDEFG", IUPAC.protein) >>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*")) >>> p Seq('ACDEFG', ProteinAlphabet()) >>> q Seq('ACDEFG', IUPACProtein()) >>> r Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*')) #Now try adding these together... >>> p+q Seq('ACDEFGACDEFG', ProteinAlphabet()) >>> p+r Traceback (most recent call last): File "", line 1, in ? File "/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line 77, in __add__ elif other.alphabet.contains(self.alphabet): File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line 110, in contains return other.stop_symbol == self.stop_symbol and \ AttributeError: ProteinAlphabet instance has no attribute 'stop_symbol' Here is an example of a more reasonable failure, >>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-")) >>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.')) >>> c Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-')) >>> d Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.')) >>> c+d Traceback (most recent call last): File "", line 1, in ? File "/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line 80, in __add__ raise TypeError, ("incompatable alphabets", str(self.alphabet), TypeError: ('incompatable alphabets', "Gapped(IUPACUnambiguousDNA(), '-')", "Gapped(IUPACUnambiguousDNA(), '.')") I am OK with this failing with a TypeError. However, one might argue that reverting to a generic DNA alphabet with no declared alphabet was desirable: Seq("AC-TGAC.TG", DNAAlphabet())) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 11:59:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 11:59:50 -0400 Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences In-Reply-To: Message-ID: <200807271559.m6RFxoej018165@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2550 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 11:59 EST ------- Trying to fix this by chaning the Alphabet and AlphabetEncoder classes' contains method only is nasty, and wouldn't cover situations like this: p = Seq("PKL-PAK", Gapped(generic_protein,"-")) q = Seq("ADKS*", HasStopCodon(generic_protein,"*")) where you might expect something like: p+q == Seq("PKL-PAKADKS*", HasStopCodon(Gapped(generic_protein,"-"),"*") Taken literally, neither of these two alphabets contains the other - but there is a fairly obvious consensus alphabet! I think the best solution would require changes to the Seq object's add method to pick a consensus alphabet in the non-simple cases where one alphabet is clearly a sub-set of the other. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 14:54:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 14:54:01 -0400 Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences In-Reply-To: Message-ID: <200807271854.m6RIs1wZ025718@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2550 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 14:54 EST ------- Created an attachment (id=977) --> (http://bugzilla.open-bio.org/attachment.cgi?id=977&action=view) Patch to Bio/Seq.py and Bio/Alphabet/__init__.py This uses some (private) alphabet functions in Bio/Alphabet/__init__.py (where I have already put a few bits extracted from or used by Bio.Align and Bio.AlignIO), and makes the old Alphabet .contains method effectively obsolete. Test case update to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 14:56:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 14:56:47 -0400 Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences In-Reply-To: Message-ID: <200807271856.m6RIulpl025828@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2550 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 14:56 EST ------- Created an attachment (id=978) --> (http://bugzilla.open-bio.org/attachment.cgi?id=978&action=view) Patches for test_seq.py and test_GACrossover.py Adds a new block of tests to test_seq.py to explicitly check a number of different alphabet combinations. Also tweaks test_GACrossover.py to define its test alphabet as a subclass of a suitable generic class in Bio.Alphabet, as otherwise it is not recognised as a valid alphabet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 15:06:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 15:06:22 -0400 Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences In-Reply-To: Message-ID: <200807271906.m6RJ6MBk026364@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2550 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 15:06 EST ------- With the patch, repeating the example in my comment 0, >>> from Bio import Alphabet >>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> a = Seq("ACTG", Alphabet.generic_dna) >>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-")) >>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-")) >>> a Seq('ACTG', DNAAlphabet()) >>> b Seq('AC-TG', Gapped(DNAAlphabet(), '-')) >>> c Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-')) >>> b+c Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-')) >>> a+b Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-')) >>> a+c Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-')) i.e. All the above additions work now. >>> p = Seq("ACDEFG", Alphabet.generic_protein) >>> q = Seq("ACDEFG", IUPAC.protein) >>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*")) >>> p Seq('ACDEFG', ProteinAlphabet()) >>> q Seq('ACDEFG', IUPACProtein()) >>> r Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*')) >>> p+q Seq('ACDEFGACDEFG', ProteinAlphabet()) >>> p+r Seq('ACDEFGACDEFG*', HasStopCodon(ProteinAlphabet(), '*')) These work too. >>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-")) >>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.')) >>> c Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-')) >>> d Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.')) >>> c+d Traceback (most recent call last): File "", line 1, in ? File "Bio/Seq.py", line 78, in __add__ a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet]) File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 199, in _consensus_alphabet raise ValueError("More than one gap character present") ValueError: More than one gap character present The error message has changed (and is more explicit), but I think this is a real failure case. Then based on the example in my comment 1, >>> p = Seq("PKL-PAK", Alphabet.Gapped(Alphabet.generic_protein,"-")) >>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*")) >>> p+q Seq('PKL-PAKADKS*', HasStopCodon(Gapped(ProteinAlphabet(), '-'), '*')) This works now too. One final example of a valid failure: >>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*")) >>> r = Seq("SRFG@", Alphabet.HasStopCodon(Alphabet.generic_protein,"@")) >>> q+r Traceback (most recent call last): File "", line 1, in ? File "Bio/Seq.py", line 78, in __add__ a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet]) File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 208, in _consensus_alphabet raise ValueError("More than one stop symbol present") ValueError: More than one stop symbol present I'd be grateful if anyone could test this, or comment on the code. While adding private functions to Bio.Alphabet is a reasonable short term solution (and means we can change arguments and names without breaking people's scripts!), some of this functionality might be best exposed publically. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:26:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:26:03 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200807280926.m6S9Q3Cn032456@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #943 is|0 |1 obsolete| | ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 05:26 EST ------- (From update of attachment 943) Checked in as part of Bio/Align/Generic.py revision 1.10 Adding __len__ would also be sensible, and perhaps __nonzero__ (which could check the number of rows AND columns). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:37:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:37:27 -0400 Subject: [Biopython-dev] [Bug 2551] New: Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2551 Summary: Adding advanced __getitem__ to generic alignment, e.g. align[1:2,5:-5] Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BugsThisDependsOn: 2507 I'm filing this as a separate sub-issue from Bug 1944. The idea is to enhance the minimal __getitem__ method now in CVS to allow accessing of rows (sequences), columns, or sub-alignments. A possible __getitem__ doc string: Depending on the indices, you can get a SeqRecord object (representing a single row), a Seq object (for a single columns), a string (for a single characters) or another alignment (representing some part or all of the alignment). align[r,c] gives a single character as a string align[r] gives a row as a SeqRecord align[r,:] gives a row as a SeqRecord align[:,c] gives a column as a Seq (using the alignment's alphabet) align[:] and align[:,:] give a copy of the alignment Anything else gives a sub alignment, e.g. align[0:2] or align[0:2,:] uses only row 0 and 1 align[:,1:3] uses only columns 1 and 2 align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2 Doing this nicely will build on adding annotation aware slicing support to the SeqRecord, which is Bug 2507. There is some __getitem__ code on Bug 1944 Attachment 732 and Bug 1944 Attachment 770. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:37:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:37:29 -0400 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200807280937.m6S9bTY8000615@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2551 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:48:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:48:56 -0400 Subject: [Biopython-dev] [Bug 2552] New: Adding alignments Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2552 Summary: Adding alignments Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This is related to the very broad alignment bug 1944. Given two alignments, it can make sense to talk about adding them together. However we can either add by row, or by column. e.g. Consider this alignment, a DNAAlphabet() alignment with 3 rows and 14 columns ACGATCAGCTAGCT Alpha CCGATCAGCTAGCT Beta ACGATGAGCTAGCT Gamma Doing a+a by column would give: DNAAlphabet() alignment with 3 rows and 28 columns ACGATCAGCTAGCTACGATCAGCTAGCT Alpha CCGATCAGCTAGCTCCGATCAGCTAGCT Beta ACGATGAGCTAGCTACGATGAGCTAGCT Gamma This sort of operation is often done to combined alignments from multiple genes (after first sorting the rows to ensure the species names are in the same order). To implement this would ideally require the ability to add SeqRecord objects together, doing something sensible with the annotation and in particular the identifies. Doing a+a by row would give: DNAAlphabet() alignment with 6 rows and 14 columns ACGATCAGCTAGCT Alpha CCGATCAGCTAGCT Beta ACGATGAGCTAGCT Gamma ACGATCAGCTAGCT Alpha CCGATCAGCTAGCT Beta ACGATGAGCTAGCT Gamma This particular example, a+a, is perhaps unrealistic due to the repeated identifiers, but I imagine there are some real use cases for this operation. More generally, suppose we have two alignments a and b. Treating each alignment as a list of SeqRecord objects, you might expect: a.extend(b) -> addition by row a+b -> addition by row However, I would suggest for alignment objects: a.extend(b) -> addition by row, requires sequence all be same length (same number of columns) a+b -> addition by column, requires same number of sequences (rows) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:53:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:53:34 -0400 Subject: [Biopython-dev] [Bug 2553] New: Adding SeqRecord objects to an alignment (append or extend) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2553 Summary: Adding SeqRecord objects to an alignment (append or extend) Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Currently a Bio.Align.Generic.Alignment object stores the rows as SeqRecord objects, but only exposes a public API for adding row sequences as strings. As suggested on Bug 1944, it would make sense to treat the Alignment as a list of SeqRecord objects and therefore support the list methods .append() and .extend() for the addition of more rows as SeqRecord objects. I would make the .append() method enforce the expectation that all rows are the same length, and that the new sequence's alphabet is compatible with the declared alignment alphabet. See also Bug 2552 - Adding alignments -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:57:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:57:52 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200807280957.m6S9vqJd001617@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn|2507 | Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 05:57 EST ------- I've filed bugs on what I think are the remaining issues raised here (Bug 1944), and am now closing this issue (as its getting very long and hard to follow): Bug 2551 - The __getitem__ method (accessing part of the alignment as an character string, row, column or sub-alignment). Bug 2552 - Adding alignments Bug 2553 - Adding SeqRecord objects to an alignment (append or extend) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 05:57:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:57:54 -0400 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200807280957.m6S9vspm001632@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO|1944 | nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 06:13:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 06:13:38 -0400 Subject: [Biopython-dev] [Bug 2554] New: Creating an Alignment from a list of SeqRecord objects Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2554 Summary: Creating an Alignment from a list of SeqRecord objects Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BugsThisDependsOn: 2553 It would be nice to be able to supply a list (or iterator) of SeqRecord objects when creating an alignment object. This would also make the Bio.SeqIO.to_alignment() function obsolete. Currently, the __init__ method takes just an alphabet: def __init__(self, alphabet): """Initialize a new Alignment object. Arguments: o alphabet - The alphabet to use for the sequence objects that are created. This alphabet must be a gapped type. """ #... My plan is to accept a list of SeqRecord objects (possibly empty) and an optional alphabet. If the alphabet is omitted, a consensus can be determined from the SeqRecord alphabets. This can be made backwards compatible: def __init__(self, records, alphabet=None): """Initialize a new Alignment object. Arguments: records - A list (or iterator) of SeqRecord objects, whose sequences are all the same length. This an be an empy list. alphabet - The alphabet for the whole alignment, typically a gapped alphabet, which should be a superset of the individual record alphabets. If ommited, a consensus alphabet is used. """ if not (isinstance(records, Alphabet.Alphabet) \ or isinstance(records, Alphabet.AlphabetEncoder)): if alphabet is None : #Backwards compatible mode! alphabet = records records = [] else : raise ValueError("Invalid records argument") #... I would expect the implementation to depend on Bug 2553 - Adding SeqRecord objects to an alignment (append or extend). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 06:13:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 06:13:41 -0400 Subject: [Biopython-dev] [Bug 2553] Adding SeqRecord objects to an alignment (append or extend) In-Reply-To: Message-ID: <200807281013.m6SADf6o002429@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2553 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2554 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 06:49:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 06:49:45 -0400 Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of SeqRecord objects In-Reply-To: Message-ID: <200807281049.m6SAnjbE003984@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2554 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 06:49 EST ------- There is an unwanted "not" in the code snippet in comment 0. Here is a preliminary implementation of the revised __init__ method plus append and extend (Bug 2533): def __init__(self, records, alphabet=None): """Initialize a new Alignment object. Arguments: records - A list (or iterator) of SeqRecord objects, whose sequences are all the same length. This an be an empty list. alphabet - The alphabet for the whole alignment, typically a gapped alphabet, which should be a super-set of the individual record alphabets. If omitted, a consensus alphabet is used. NOTE - Earlier versions of Biopython only accepted a single argument, an alphabet. This is still supported via a backwards compatible "hack" so as not to disrupt existing scripts and users. """ if isinstance(records, Alphabet.Alphabet) \ or isinstance(records, Alphabet.AlphabetEncoder): if alphabet is None : #Backwards compatible mode! alphabet = records records = [] else : raise ValueError("Invalid records argument") if alphabet is not None : if not (isinstance(alphabet, Alphabet.Alphabet) \ or isinstance(alphabet, Alphabet.AlphabetEncoder)): raise ValueError("Invalid alphabet argument") self._alphabet = alphabet else : #Default while we add sequences, will take a consensus later self._alphabet = Alphabet.single_letter_alphabet self._records = [] self.extend(records) if alphabet is None : #No alphabet was given, take a consensus alphabet #TODO - Use a generator expression once we drop python 2.3: self.alphabet = Alphabet._consensus_alphabet([rec.seq.alphabet for \ rec in self._records]) self._records = [] def extend(self, records) : """Add more SeqRecord objects to the alignment as rows. They must all have the same length as the original alignment, and have alphabets compatible with the alignment's alphabet.""" for rec in records : self.append(rec) def append(self, record) : """Add one more SeqRecord object to the alignment as a new row. This must have the same length as the original alignment (unless this is the first record), and have an alphabet compatible with the alignment's alphabet.""" if not isinstance(record, SeqRecord) : raise TypeError("New sequence is not a SeqRecord object") if self._records and len(record) <> self.get_alignment_length() : raise ValueError("New sequence is not of length %i" \ % self.get_alignment_length()) #Using not self._alphabet.contains(record.seq.alphabet) needs fixing #for AlphabetEncoders (e.g. gapped versus ungapped). if not Alphabet._are_alphabets_compatible(self._alphabet, \ record.seq.alphabet) : raise ValueError("New sequence's alphabet is incompatible") self._records.append(record) The unit tests look fine with this addition. Of course, new tests to verify this functionality explicitly should then be added (and I could take advantage of this in Bio.AlignIO too). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 06:54:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 06:54:12 -0400 Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of SeqRecord objects In-Reply-To: Message-ID: <200807281054.m6SAsClZ004173@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2554 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 06:54 EST ------- Regarding the code in comment 1, the private function _are_alphabets_compatible() isn't in CVS, its something I was playing with on Bug 2550 - Alphabet problems when adding sequences. However, I hope that this conveys my overall intention for the Alignment object. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 29 22:22:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 29 Jul 2008 22:22:59 -0400 Subject: [Biopython-dev] [Bug 2557] New: AlignIO::write fails when delegating to SeqIO::write Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2557 Summary: AlignIO::write fails when delegating to SeqIO::write Product: Biopython Version: 1.47 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rsuri at cs.utexas.edu In line 185 of "biopython/Bio/AlignIO/__init__.py" in the current CVS version, there's a call to SeqIO::write with only 2 arguments instead of the required 3 ["SeqIO.write(alignment.get_all_seqs(), format)"] should be ["SeqIO.write(alignment.get_all_seqs(), handle, format)"] (i.e. pass the handle object). I know this happens when trying to output to FASTA format, and it appears to do so more generally whenever the SeqIO module can be used for output. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 29 22:36:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 29 Jul 2008 22:36:07 -0400 Subject: [Biopython-dev] [Bug 2558] New: AlignIO nexus parsing chokes on superfluous comma Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2558 Summary: AlignIO nexus parsing chokes on superfluous comma Product: Biopython Version: 1.47 Platform: All URL: http://www.cs.utexas.edu/~rsuri/M3579.NX OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rsuri at cs.utexas.edu The URL above points to a nexus file (also available from TreeBase with Matrix accession #M3579) that causes BioPython to raise an error when reading it with the AlignIO module. In the "Trees" section of the input file, the final taxon ("Lecanorales") has a trailing comma that causes BioPython to fail (search for the line beginning with "59"). I've verified that manually deleting the offending comma is a valid workaround. I don't know what the nexus format specification says, but this is poor form for BioPython, in my opinion. It seems reasonable enough to allow for some slack like this when reading formatted files. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 30 04:55:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 30 Jul 2008 04:55:42 -0400 Subject: [Biopython-dev] [Bug 2557] AlignIO::write fails when delegating to SeqIO::write In-Reply-To: Message-ID: <200807300855.m6U8tgLU019854@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2557 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-30 04:55 EST ------- That's embarrassing for me! Bug confirmed and fixed in CVS. I've used a very slightly simpler fix, taking advantage of the fact that you can iterate for the SeqRecords within an alignment: SeqIO.write(alignment, handle, format) I've also updated the Bio.AlignIO unit test to cover writing a couple of the formats supported via Bio.SeqIO ("fasta" and "tab"), although it might make sense to try all of them... Checking in Bio/AlignIO/__init__.py; /home/repository/biopython/biopython/Bio/AlignIO/__init__.py,v <-- __init__.py new revision: 1.10; previous revision: 1.9 done Checking in Tests/test_AlignIO.py; /home/repository/biopython/biopython/Tests/test_AlignIO.py,v <-- test_AlignIO.py new revision: 1.12; previous revision: 1.11 done Checking in Tests/output/test_AlignIO; /home/repository/biopython/biopython/Tests/output/test_AlignIO,v <-- test_AlignIO new revision: 1.10; previous revision: 1.9 done Thank you for reporting this oversight, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 30 05:23:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 30 Jul 2008 05:23:59 -0400 Subject: [Biopython-dev] [Bug 2558] Bio.Nexus chokes on TRANSLATE block with superfluous comma In-Reply-To: Message-ID: <200807300923.m6U9Nx8l021492@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2558 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|AlignIO nexus parsing chokes|Bio.Nexus chokes on |on superfluous comma |TRANSLATE block with | |superfluous comma ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-30 05:23 EST ------- This is an issue in the Bio.Nexus module, so its a job for Frank. Do you know if this affects all the NEXUS files from www.treebase.org? I've tried downloading several trees, but their FTP site is just timing out for me. According to http://www.treebase.org/treebase/submit.html the request trees be uploaded in the NEXUS file format so its possible that just a minority of their trees have this trailing comma. Note that this may be an invalid file (a TRANSLATE block with trailing comma), but as you say it looks relatively straight forward to cope with. However, I have had a quick look at the Bio.Nexus code, and I don't entirely understand what Frank's parser is doing here - so its not going to be a quick fix from me. Quick bit of python to show the stack trace: >>> from Bio.Nexus import Nexus >>> n = Nexus(open("M3579.NX")) Traceback (most recent call last): File "", line 1, in TypeError: 'module' object is not callable >>> n = Nexus.Nexus(open("M3579.NX")) Traceback (most recent call last): File "", line 1, in File "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", line 552, in __init__ self.read(input) File "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", line 614, in read self._parse_nexus_block(title, contents) File "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", line 655, in _parse_nexus_block getattr(self,'_'+line.command)(line.options) File "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", line 922, in _translate raise NexusError,'Format error in line %s.' % options Bio.Nexus.Nexus.NexusError: Format error in line 1 'Rolfidium_coccocarpioides', 2 'Mycoblastus_sanguinarius', 3 'Protoblastenia_rupestris', 4 'Myxobilimbia_sabuletorum', 5 'Byssoloma_leucoblepharum', 6 'Stereocaulon_tomentosum', 7 'Scoliciosporum_umbrinum', 8 'Haematomma_ochroleucum', 9 'Glyphopeltis_ligustica', 10 'Catinaria_atropurpurea', 11 'Miriquidica_garovaglii', 12 'Sphaerophorus_globosus', 13 'Lecidea_atrosanguinea', 14 'Cladonia_peziziformis', 15 'Stereocaulon_pileatum', 16 'Frutidella_caesioatra', 17 'Fellhanera_bouteillei', 18 'Tonina_cinereovirens', 19 'Helocarpon_crassipes', 20 'Micarea_alabastrites', 21 'Squamarina_lentigera', 22 'Lecanora_intumescens', 23 'Bellemerea_diamarta', 24 'Lopadium_disciforme', 25 'Herteliana_taylorii', 26 'Lepraria_lobificans', 27 'Psilolechia_leprosa', 28 'Protomicarea_limosa', 29 'Calopadia_foliicola', 30 'Fellhanera_subtilis', 31 'Pyrrhospora_quernea', 32 'Lecidella_meiococca', 33 'Hypogymnia_physodes', 34 'Ramalina_fastigiata', 35 'Halecania_alpivaga', 36 'Platismatia_glauca', 37 'Lepraria_bergensis', 38 'Micarea_micrococca', 39 'Lecania_atrynoides', 40 'Crocynia_gossypina', 41 'Psilolechia_lucida', 42 'Lecanora_allophana', 43 'Cladonia_digitata', 44 'Schadonia_fecunda', 45 'Psorula_rufonigra', 46 'Adelolecia_pilati', 47 'Lecidea_turgidula', 48 'Micarea_sylvicola', 49 'Lecidea_fuscoatra', 50 'Psora_rubiformis', 51 'Micarea_erratica', 52 'Megalaria_grossa', 53 'Lecidea_silacea', 54 'Micarea_intrusa', 55 'Psora_decipiens', 56 'Tephromela_atra', 57 'Bacidia_rosella', 58 'Micarea_adnata', 59 'Lecanorales',. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 30 08:57:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 30 Jul 2008 08:57:00 -0400 Subject: [Biopython-dev] [Bug 2558] Bio.Nexus chokes on TRANSLATE block with superfluous comma In-Reply-To: Message-ID: <200807301257.m6UCv0co031445@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2558 fkauff at biologie.uni-kl.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-30 08:57 EST ------- I'm all for a little bit of slack in parsers, but this looks in my opinion like a straightforward syntax error in the nexus file. I work with nexus files daily, and have never encountered such a trailing comma. What really confuses me is that there are 58 taxa in the data set, and no. 59 Lecanorales is in addition, with no data and no occurence in the tree. I don't think this is proper nexus format. Frank (In reply to comment #1) > This is an issue in the Bio.Nexus module, so its a job for Frank. > > Do you know if this affects all the NEXUS files from www.treebase.org? I've > tried downloading several trees, but their FTP site is just timing out for me. > According to http://www.treebase.org/treebase/submit.html the request trees be > uploaded in the NEXUS file format so its possible that just a minority of their > trees have this trailing comma. > > Note that this may be an invalid file (a TRANSLATE block with trailing comma), > but as you say it looks relatively straight forward to cope with. However, I > have had a quick look at the Bio.Nexus code, and I don't entirely understand > what Frank's parser is doing here - so its not going to be a quick fix from me. > > > Quick bit of python to show the stack trace: > > >>> from Bio.Nexus import Nexus > >>> n = Nexus(open("M3579.NX")) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'module' object is not callable > >>> n = Nexus.Nexus(open("M3579.NX")) > Traceback (most recent call last): > File "", line 1, in > File > "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", > line 552, in __init__ > self.read(input) > File > "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", > line 614, in read > self._parse_nexus_block(title, contents) > File > "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", > line 655, in _parse_nexus_block > getattr(self,'_'+line.command)(line.options) > File > "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", > line 922, in _translate > raise NexusError,'Format error in line %s.' % options > Bio.Nexus.Nexus.NexusError: Format error in line 1 'Rolfidium_coccocarpioides', > 2 'Mycoblastus_sanguinarius', 3 'Protoblastenia_rupestris', 4 > 'Myxobilimbia_sabuletorum', 5 'Byssoloma_leucoblepharum', 6 > 'Stereocaulon_tomentosum', 7 'Scoliciosporum_umbrinum', 8 > 'Haematomma_ochroleucum', 9 'Glyphopeltis_ligustica', 10 > 'Catinaria_atropurpurea', 11 'Miriquidica_garovaglii', 12 > 'Sphaerophorus_globosus', 13 'Lecidea_atrosanguinea', 14 > 'Cladonia_peziziformis', 15 'Stereocaulon_pileatum', 16 > 'Frutidella_caesioatra', 17 'Fellhanera_bouteillei', 18 'Tonina_cinereovirens', > 19 'Helocarpon_crassipes', 20 'Micarea_alabastrites', 21 > 'Squamarina_lentigera', 22 'Lecanora_intumescens', 23 'Bellemerea_diamarta', 24 > 'Lopadium_disciforme', 25 'Herteliana_taylorii', 26 'Lepraria_lobificans', 27 > 'Psilolechia_leprosa', 28 'Protomicarea_limosa', 29 'Calopadia_foliicola', 30 > 'Fellhanera_subtilis', 31 'Pyrrhospora_quernea', 32 'Lecidella_meiococca', 33 > 'Hypogymnia_physodes', 34 'Ramalina_fastigiata', 35 'Halecania_alpivaga', 36 > 'Platismatia_glauca', 37 'Lepraria_bergensis', 38 'Micarea_micrococca', 39 > 'Lecania_atrynoides', 40 'Crocynia_gossypina', 41 'Psilolechia_lucida', 42 > 'Lecanora_allophana', 43 'Cladonia_digitata', 44 'Schadonia_fecunda', 45 > 'Psorula_rufonigra', 46 'Adelolecia_pilati', 47 'Lecidea_turgidula', 48 > 'Micarea_sylvicola', 49 'Lecidea_fuscoatra', 50 'Psora_rubiformis', 51 > 'Micarea_erratica', 52 'Megalaria_grossa', 53 'Lecidea_silacea', 54 > 'Micarea_intrusa', 55 'Psora_decipiens', 56 'Tephromela_atra', 57 > 'Bacidia_rosella', 58 'Micarea_adnata', 59 'Lecanorales',. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 31 11:58:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 Jul 2008 11:58:08 -0400 Subject: [Biopython-dev] [Bug 2560] New: Adding BLAST support to Bio.AlignIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2560 Summary: Adding BLAST support to Bio.AlignIO Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk I think it can sometimes be useful to regard a BLAST output file as a series of pairwise alignments - and therefore it makes sense to add it to Bio.AlignIO and another input file format. http://biopython.org/wiki/AlignIO Note that the AlignIO API will not allow any "clumping" of the pairwise alignments (or HSPs in Blast terminology) according to the query or the target sequence - you just get them all one after the other. I will attach a rough Bio/AlignIO/BlastIO.py file which attempts to mimic the naming conventions in the fasta-m10 parser. This currently using Bio.Blast to do the actual parsing, and then just using the Blast results to build alignment objects with two sequences each. I suggest using the format names "blast" and "blastxml" for the plain text and XML output formats following BioPerl (although I would prefer "blast-xml" to "blastxml"), see http://www.bioperl.org/wiki/HOWTO:SearchIO#Design -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 31 12:00:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 Jul 2008 12:00:23 -0400 Subject: [Biopython-dev] [Bug 2560] Adding BLAST support to Bio.AlignIO In-Reply-To: Message-ID: <200807311600.m6VG0NAq021299@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2560 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-31 12:00 EST ------- Created an attachment (id=980) --> (http://bugzilla.open-bio.org/attachment.cgi?id=980&action=view) New file Bio/AlignIO/BlastIO.py The included "self test" just parses all the unit tests (excluding the PSI-Blast and HTML files). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 08:36:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 04:36:33 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807010836.m618aXO8014712@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 fkauff at biologie.uni-kl.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #13 from fkauff at biologie.uni-kl.de 2008-07-01 04:36 EST ------- Just uploaded a new Nexus.py to CVS. First, the taxlabels command in a taxa block is now ignored. For a standard nexus file, taxon labels are in the matrix, and a taxon block is irrelevant. The only exception are transposed matrices, which are not supported by Nexus.py anyway. Without the added confusion of a separate taxlabels command, it is now fairly easy to deal with duplicate names. Both self.taxlabels and self.matrix now carry the same set of unique taxon names. All example files seem to work fine for me. unless I hear otherwise, I close this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 09:01:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 05:01:29 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807010901.m6191TxO015999@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 05:01 EST ------- Does this mean that there will be no way to see the original non-unique names from within Bio.Nexus? I agree they are a pain, but it would be nice to preserve them. I haven't read the Nexus specs (restricted article), but does this comment on the issue of repeated identifiers? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 09:13:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 05:13:02 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807010913.m619D2vK016454@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #15 from fkauff at biologie.uni-kl.de 2008-07-01 05:13 EST ------- Yes, the original non-unique names are currently not preserved. It would be fairly easy to keep them, if desired. The NEXUS specs (Maddison et al.) state that unique names "should be avoided if this might cause ambiguity", which imho they always do. But I experienced that sometimes names become identical due to truncation etc, so I needed a way to deal with it instead of just throwing an error. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 13:16:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 09:16:57 -0400 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200807011316.m61DGvGS029051@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-01 09:16 EST ------- I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for everyone (instead creating their own uppercase-lowercase variants of those terribly complicated biopython alphabet classes), and easy to change for all other modules if lowercase-uppercase is what they want (or need). Nexus.py and Phd.py certainly need to allow lowercase characters, as this is very common. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 15:56:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 11:56:03 -0400 Subject: [Biopython-dev] [Bug 2533] New: Support for simple "tab" format in Bio.SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2533 Summary: Support for simple "tab" format in Bio.SeqIO Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Requested on the mailing list by Giovanni Marco Dall'Olio: http://lists.open-bio.org/pipermail/biopython/2008-June/004312.html See BioPerl: http://www.bioperl.org/wiki/Tab_sequence_format Suggested implementation to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 15:57:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 11:57:26 -0400 Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in Bio.SeqIO In-Reply-To: Message-ID: <200807011557.m61FvQN5006042@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2533 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 11:57 EST ------- Created an attachment (id=962) --> (http://bugzilla.open-bio.org/attachment.cgi?id=962&action=view) New file Bio/SeqIO/TabIO.py Treats the first field as the record's .id (and .name) Treats the second field as the record's sequence. When writing, uses only record.id and record.seq -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 16:00:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 12:00:59 -0400 Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in Bio.SeqIO In-Reply-To: Message-ID: <200807011600.m61G0xUp006217@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2533 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 12:00 EST ------- Created an attachment (id=963) --> (http://bugzilla.open-bio.org/attachment.cgi?id=963&action=view) Patch to add the "tab" format to Bio.SeqIO and update the unit test output The plumbing to make Bio.SeqIO (and Bio.AlignIO) aware of the new format. Adds the reader/writer mapping to Bio/SeqIO/__init__.py (trivial) and gives the updated output from test_SeqIO.py (trivial to regenerate with "python run_tests.py -g test_SeqIO.py"). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 2 10:33:35 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 11:33:35 +0100 Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez Message-ID: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> Hello Michiel et al., I've already added a few if statements to the end of Bio.Entrez._open() to catch a few errors I'd observed, and I've just found another example: >>> from Bio import Entrez >>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read() '\n' >>> Entrez.efetch("nucleotide", id="fiction").read() '\n' This seems to happen for any invalid identifier. Are you happy for me to check for this as an error too? Are there any valid reasons to get back an empty dataset like this? Also, I was wondering if we should raise a ValueError rather than IOError if we are fairly sure the problem is with the arguments rather than the network or the sever being unavailable. Peter From sdavis2 at mail.nih.gov Wed Jul 2 11:18:43 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 Jul 2008 07:18:43 -0400 Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez In-Reply-To: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> Message-ID: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com> On Wed, Jul 2, 2008 at 6:33 AM, Peter wrote: > Hello Michiel et al., > > I've already added a few if statements to the end of > Bio.Entrez._open() to catch a few errors I'd observed, and I've just > found another example: > >>>> from Bio import Entrez >>>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read() > '\n' >>>> Entrez.efetch("nucleotide", id="fiction").read() > '\n' > > This seems to happen for any invalid identifier. Are you happy for me > to check for this as an error too? Are there any valid reasons to get > back an empty dataset like this? If the ability to use history is added, then an empty dataset could be a valid return after an empty search. For id-based-searches, I'm not sure I would raise an error for an empty set being returned anyway. Just my $0.02. Sean > Also, I was wondering if we should raise a ValueError rather than > IOError if we are fairly sure the problem is with the arguments rather > than the network or the sever being unavailable. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Wed Jul 2 11:34:32 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 12:34:32 +0100 Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez In-Reply-To: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com> References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com> Message-ID: <320fb6e00807020434p474cect7a7b0d51148d7760@mail.gmail.com> >> This seems to happen for any invalid identifier. Are you happy for me >> to check for this as an error too? Are there any valid reasons to get >> back an empty dataset like this? > > If the ability to use history is added, then an empty dataset could be > a valid return after an empty search. ... Bio.Entrez has always supported the history, its just up to the user to take advantage of it. I've included an example in the tutorial to explain how to do this, cut and pasted below: from Bio import Entrez search_handle = Entrez.esearch(db="nucleotide",term="Opuntia and rpl16", usehistory="y", email="history.user at example.com") search_results = Entrez.read(search_handle) search_handle.close() gi_list = search_results["IdList"] count = int(search_results["Count"]) assert count == len(gi_list) session_cookie = search_results["WebEnv"] query_key = search_results["QueryKey"] #Now use the history session cookie and query key to download the results in batchs batch_size = 3 out_handle = open("orchid_rpl16.fasta", "w") for start in range(0,count,batch_size) : end = min(count, start+batch_size) print "Going to download record %i to %i" % (start+1, end) fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retstart=start, retmax=batch_size, webenv=session_cookie, query_key=query_key, email="history.user at example.com") data = fetch_handle.read() fetch_handle.close() out_handle.write(data) out_handle.close() Feedback on the tutorial or the example is of course welcome. > For id-based-searches, I'm not sure I would raise an error for an empty > set being returned anyway. > > Just my $0.02. I was wondering about this kind of thing... maybe some more testing of these kinds of examples would be in order. Peter From biopython at maubp.freeserve.co.uk Wed Jul 2 13:03:36 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 14:03:36 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO Message-ID: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com> Hi all, Do any of you have any comments or feedback on this suggested new "simple tab separated" format for Bio.SeqIO? To match BioPerl I plan on calling it the "tab" format - see below. Any real world example files would be good for the test suite. One nice thing is it adds another output format, something we're a bit short of in Bio.SeqIO with only fasta and some alignment formats (now handled via Bio.AlignIO, i.e. pfam/stockholm, clustal and phylip). Peter ---------- Forwarded message ---------- From: Peter Date: Tue, Jul 1, 2008 at 5:06 PM Subject: Re: [BioPython] Sequence from Fasta To: dalloliogm at gmail.com Cc: biopython at biopython.org Giovanni wrote: > yes, I think it will be useful to implement. > I know of people who have written a customized fasta2tab script and > use it quite frequently, so it would be good to support such a task. > As you said before this format is commonly used in combination with > grep/gawk scripts. I've gone for the simple option about how to parse the first field, its used as the record identifer (.id) and name only (nothing clever). Here is my suggested code, which you are welcome to download and try out. Bug 2533 - Support for simple "tab" format in Bio.SeqIO http://bugzilla.open-bio.org/show_bug.cgi?id=2533 If you want to try this yourself you'll need to download the new file TabIO.py into the Bio/SeqIO folder and update Bio/SeqIO/__init__.py to tell it about the new format (two new lines, see patch). Peter From biopython at maubp.freeserve.co.uk Wed Jul 2 13:21:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 14:21:29 +0100 Subject: [Biopython-dev] Questions about the NEXUS format Message-ID: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> Hello again Frank, As Biopython's NEXUS expect, I've got a couple of hopefully trivial questions about the format, which connect to how best to handle it the Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO http://biopython.org/wiki/AlignIO My short questions are: Q1: Can a file contain more than one NEXUS record (i.e. concatenation, with more than one #NEXUS line)? Q2: Can a NEXUS record/file contain more than one alignment (matrix block)? If the answer to either of those is a "yes", then any example files you could contribute would be very helpful. I have a more complicated question too, which would help me to resolve Bug 2227: http://bugzilla.open-bio.org/show_bug.cgi?id=2227 Q3: Given a generic Alignment object (e.g. from one of the other parsers), can I construct a corresponding Nexus object where the aligned sequences are used for the matrix? If so, how? Thank you, Peter From mjldehoon at yahoo.com Wed Jul 2 13:30:06 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Jul 2008 06:30:06 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics Message-ID: <29487.55988.qm@web62410.mail.re1.yahoo.com> Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format. In this format, each sequence has a name and comments, and in addition there can also be an overall comment to the file. Currently the parser in Bio.IntelliGenetics stores this information in Bio.IntelliGenetics.Record.Record objects (one record per sequence; the overall comment is inadvertently added to the first sequence in the file). I think it makes more sense to use a SeqRecord for that, and to deprecate Bio.IntelliGenetics.Record.Record. In that case, Bio.SeqIO looks like a more suitable place for this parser. The user would see something like this: >>> from Bio import SeqIO >>> handle = open("mydatafile.txt") >>> records = SeqIO.parse(handle, "ig") >>> records.comment "This is the overall comment" >>> for record in records: # ... record is a SeqRecord. Because of the overall comment, SeqIO.parse cannot simply return a generator function. It must return a full-fledged class, but one with an iterator. Any objections, anybody? --Michiel From biopython at maubp.freeserve.co.uk Wed Jul 2 13:48:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 14:48:31 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <29487.55988.qm@web62410.mail.re1.yahoo.com> References: <29487.55988.qm@web62410.mail.re1.yahoo.com> Message-ID: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> On Wed, Jul 2, 2008 at 2:30 PM, Michiel de Hoon wrote: > Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format. Just to be upfront, I'm not familiar with this format, but I've had a look at the examples. > In this format, each sequence has a name and comments, and in addition there can > also be an overall comment to the file. OK. This is also the case in other file formats, for example GenBank files can have free format text file header at the start but we ignore this. How would you separate the file header comment from the first record comment? Some files include what looks like a file header but the lines all seem to start with "; ". Maybe look for "; LOCUS..."? Given the whole comment seems to be free format I don't think this is very nice. On the other hand, some of the sample inputs includes a number of lines starting ";; Modified by ..." which would be easy to separate (one semi colon versus two semi colons). These are clearly file-level header lines, rather than being part of the first record. > Currently the parser in Bio.IntelliGenetics stores this information in > Bio.IntelliGenetics.Record.Record objects (one record per sequence; the > overall comment is inadvertently added to the first sequence in the file). I > think it makes more sense to use a SeqRecord for that, and to deprecate > Bio.IntelliGenetics.Record.Record. If all the data extracted by the Bio.IntelliGenetics parser could be dealt with using the SeqRecord parser added to Bio.SeqIO, then yes deprecating Bio.IntelliGenetics sounds fine. > In that case, Bio.SeqIO looks like a more suitable place for this parser. > The user would see something like this: >>>> from Bio import SeqIO >>>> handle = open("mydatafile.txt") >>>> records = SeqIO.parse(handle, "ig") >>>> records.comment > "This is the overall comment" >>>> for record in records: > # ... record is a SeqRecord. I see you are using "ig" as the format name, matching EMBOSS. Good :) http://emboss.sourceforge.net/docs/themes/seqformats/ig > Because of the overall comment, SeqIO.parse cannot simply return a > generator function. It must return a full-fledged class, but one with an iterator. Not necessarily. We can still use a simple generator function and either throw away the header comment, or included it with the first record (or even with every record). If you did create an iterator class, would you make the header available as a property of the iterator? Given the apparently fuzzy boundary between the file header and the first record header, I would just opt to treat it all as a comment for the first record. And use a simple generator function. Peter From fkauff at biologie.uni-kl.de Wed Jul 2 14:01:01 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Wed, 02 Jul 2008 16:01:01 +0200 Subject: [Biopython-dev] Questions about the NEXUS format In-Reply-To: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> Message-ID: <486B8A1D.8090806@biologie.uni-kl.de> Hi Peter, Peter wrote: > Hello again Frank, > > As Biopython's NEXUS expect, I've got a couple of hopefully trivial > questions about the format, which connect to how best to handle it the > Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO > http://biopython.org/wiki/AlignIO > > My short questions are: > > Q1: Can a file contain more than one NEXUS record (i.e. concatenation, > with more than one #NEXUS line)? > As far as I know: no. #NEXUS just indicates the file being a NEXUS file, the concept of "records" is not part of a nexus file > Q2: Can a NEXUS record/file contain more than one alignment (matrix block)? > > I just had a quick look at the old Maddison et al. introductory paper of Nexus, and it says that "although the nexus standard does not impose constraints on the number of blocks, particular programs will". I don't know of any program that would read more than one data block and keep both of them. > If the answer to either of those is a "yes", then any example files > you could contribute would be very helpful. > > I have a more complicated question too, which would help me to resolve Bug 2227: > http://bugzilla.open-bio.org/show_bug.cgi?id=2227 > > Q3: Given a generic Alignment object (e.g. from one of the other > parsers), can I construct a corresponding Nexus object where the > aligned sequences are used for the matrix? If so, how? > Hmmm - not really. Nexus.py does not support "empty" nexus class objects that could be filled with data (just tried) . But it would actually be a nice thing to have. I'll put this on my to do list. Cheers, Frank > Thank you, > > Peter > > ' From biopython at maubp.freeserve.co.uk Wed Jul 2 14:01:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:01:13 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> References: <29487.55988.qm@web62410.mail.re1.yahoo.com> <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> Message-ID: <320fb6e00807020701k2f5bf546j2d5ef3514a24e31a@mail.gmail.com> Hello again, Interestingly the IntelliGenetics looks the same as the MASE alignment file format: http://www.bioperl.org/wiki/Mase_multiple_alignment_format On the other hand, the EMBOSS example is clearly not a multiple sequence alignment: http://emboss.sourceforge.net/docs/themes/seqformats/ig Adding the parser to Bio.SeqIO would let us read in alignments too via Bio.AlignIO (which will offload the parsing to Bio.SeqIO and then try and convert the SeqRecords into an Alignment). Peter From biopython at maubp.freeserve.co.uk Wed Jul 2 14:06:40 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:06:40 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com> References: <29487.55988.qm@web62410.mail.re1.yahoo.com> <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com> Message-ID: <320fb6e00807020706l28309346m57e7bd884a0b7b9b@mail.gmail.com> Forgot to send this to the list, another point about IntelliGenetics vs MASE ---------- Forwarded message ---------- From: Peter Date: Wed, Jul 2, 2008 at 3:05 PM Subject: Re: [Biopython-dev] Bio.IntelliGenetics To: mjldehoon at yahoo.com > How would you separate the file header comment from the first record > comment? Some files include what looks like a file header but the > lines all seem to start with "; ". Maybe look for "; LOCUS..."? > Given the whole comment seems to be free format I don't think this is > very nice. > > On the other hand, some of the sample inputs includes a number of > lines starting ";; Modified by ..." which would be easy to separate > (one semi colon versus two semi colons). These are clearly file-level > header lines, rather than being part of the first record. I found an old link I had added on the wiki page for SeqIO development, http://pbil.univ-lyon1.fr/help/formats.html This clearly describes MASE format format s having (optional) header lines as starting with two semi colons. But are MASE and IntelliGenetics the same thing? Petet From biopython at maubp.freeserve.co.uk Wed Jul 2 14:12:48 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:12:48 +0100 Subject: [Biopython-dev] Questions about the NEXUS format In-Reply-To: <486B8A1D.8090806@biologie.uni-kl.de> References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> <486B8A1D.8090806@biologie.uni-kl.de> Message-ID: <320fb6e00807020712y54874e02k6854b92e1711358d@mail.gmail.com> >> My short questions are: >> >> Q1: Can a file contain more than one NEXUS record (i.e. concatenation, >> with more than one #NEXUS line)? > > As far as I know: no. #NEXUS just indicates the file being a NEXUS file, the > concept of "records" is not part of a nexus file OK, thank you. >> Q2: Can a NEXUS record/file contain more than one alignment (matrix >> block)? > > I just had a quick look at the old Maddison et al. introductory paper of > Nexus, and it says that "although the nexus standard does not impose > constraints on the number of blocks, particular programs will". I don't know > of any program that would read more than one data block and keep both of > them. So that is a "yes in theory", but it doesn't sound worth worrying about. >> Q3: Given a generic Alignment object (e.g. from one of the other >> parsers), can I construct a corresponding Nexus object where the >> aligned sequences are used for the matrix? If so, how? > > Hmmm - not really. Nexus.py does not support "empty" nexus class objects > that could be filled with data (just tried) . But it would actually be a > nice thing to have. I'll put this on my to do list. Thanks, Peter From mjldehoon at yahoo.com Wed Jul 2 14:15:16 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Jul 2008 07:15:16 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> Message-ID: <529945.38158.qm@web62404.mail.re1.yahoo.com> > On the other hand, some of the sample inputs includes a number of > lines starting ";; Modified by ..." which would be easy to separate > (one semi colon versus two semi colons). These are clearly file-level > header lines, rather than being part of the first record. According to the website mentioned in Bio/IntelliGenetics/__init__.py, the file-level comments have two semi colons, as opposed to the sequence-specific comments, which have one semi colon. http://pbil.univ-lyon1.fr/help/formats.html > If you did create an iterator class, would you make the > header available as a property of the iterator? I am not sure what you mean by a property of the iterator. I was thinking to simply add a field to the class. ---Michiel. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 14:38:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 10:38:52 -0400 Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like TextTools in run_tests.py In-Reply-To: Message-ID: <200807021438.m62Ecqma013815@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2524 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Documentation |Unit Tests ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:38 EST ------- Filing under "Unit Tests". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 14:39:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 10:39:22 -0400 Subject: [Biopython-dev] [Bug 2469] requires_wise.py fails on Windows (test suite) In-Reply-To: Message-ID: <200807021439.m62EdMM9013903@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2469 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Main Distribution |Unit Tests ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:39 EST ------- Filing under "Unit Tests" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 2 14:56:00 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:56:00 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <529945.38158.qm@web62404.mail.re1.yahoo.com> References: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> <529945.38158.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00807020756r4de8ed4bi3f8b409d75996a14@mail.gmail.com> >> If you did create an iterator class, would you make the >> header available as a property of the iterator? > > I am not sure what you mean by a property of the iterator. I was > thinking to simply add a field to the class. Adding the file header field to the iterator class? You could do I suppose. Right now all the Bio.SeqIO parsers use generator functions (although not in AlignIO), although I have no objection to returning iterator classes instead. I don't really like the idea of Bio.SeqIO parsers returning anything other than SeqRecord objects - even if it is indirectly via a richer iterator object. I see the Bio.SeqIO as a common unified API, and the downside is sometimes extra data doesn't really fit. If there really is some important meta-data in a file format that applies to all the records, then it cannot easily be represented in the Bio.SeqIO system except as annotation added to every single SeqRecord. e.g. Add the header to the annotations dictionary under "file-header" or something. Peter From mjldehoon at yahoo.com Wed Jul 2 15:29:31 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Jul 2008 08:29:31 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com> Message-ID: <318336.37817.qm@web62405.mail.re1.yahoo.com> --- On Wed, 7/2/08, Peter wrote: > I found an old link I had added on the wiki page for SeqIO > development, > http://pbil.univ-lyon1.fr/help/formats.html > > This clearly describes MASE format format s having > (optional) header > lines as starting with two semi colons. But are MASE and > IntelliGenetics the same thing? It may be that the link in Bio/IntelliGenetics/__init__.py actually does not pertain the the IntelliGenetics format. Except for this link (which as you point out actually talks about the MASE format, not the IntelliGenetics format), I have seen no description elsewhere of these file-wide comments preceded by a double semi-colon in the IntelliGenetics format. Even Biopython doesn't treat these consistently: The tests for Bio.IntelliGenetics include comments with the double semi-colon, but the parser doesn't treat them differently from sequence-specific comments. So let's do the following: For the IntelliGenetics parser, do not look for double semi-colon comments. Only check if the first character in a line is a semi-colon, and if so, treat it as a sequence-specific comment. This is what Bio.IntelliGenetics currently does anyway. Replace the parser class in Bio.IntelliGenetics by a generator function, and integrate it with Bio.SeqIO. Then, let's replace the IntelliGenetics tests by files that do not contain the double semi-colon comments. Does that sound OK? --Michiel. --Michiel. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 16:28:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 12:28:19 -0400 Subject: [Biopython-dev] [Bug 2535] New: Support for PIR / NBRF format in Bio.SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2535 Summary: Support for PIR / NBRF format in Bio.SeqIO Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BioPerl and EMBOSS both refer to this as the "pir" format, although EMBOSS also supports "nbrf" as an alternative. http://bioperl.org/wiki/PIR_sequence_format Patch to follow, a new parser and writer in plain python. The existing Martel based parser in Bio.NBRF could then be deprecated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 16:30:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 12:30:28 -0400 Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in Bio.SeqIO In-Reply-To: Message-ID: <200807021630.m62GUS5B025377@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2535 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 12:30 EST ------- Created an attachment (id=964) --> (http://bugzilla.open-bio.org/attachment.cgi?id=964&action=view) New file Bio/SeqIO/PirIO.py Note that the details of storing the sequence type may need tweaking for better agreement with the de-facto conventions from the GenBank parser. As part of this the following dictionary may be useful, from Bio/NBRF/ValSeq.py valid_sequence_dict = { "P1": "complete protein", "F1": "protein fragment", \ "DL": "linear DNA", "DC": "circular DNA", "RL": "linear RNA", \ "RC":"circular RNA", "N3": "transfer RNA", "N1": "other" } -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 17:37:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 13:37:05 -0400 Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in Bio.SeqIO In-Reply-To: Message-ID: <200807021737.m62Hb5lX031417@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2535 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 13:37 EST ------- My patch doesn't accept the "N1" sequence type mentioned in Bio/NBRF/ValSeq.py Also when recording a SeqRecord from a non-PIR input, we could try and guess the sequence type. The alphabet itself is one clue. GenBank and EMBL files should also record if the sequence is linear or circular, as well as a sequence type. For proteins, I don't see how to decide between P1 and F1 though (complete protein vs protein fragment). Maybe default to F1? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 19:51:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 15:51:49 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807021951.m62Jpnx3012202@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 ------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-02 15:51 EST ------- Even better docs: http://blog.doughellmann.com/2007/07/pymotw-subprocess.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 13:24:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 09:24:32 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807031324.m63DOWDA018278@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 09:24 EST ------- Hi Frank, I see you've updated Bio/Nexus/Nexus.py with CVS revision 1.16 to record the original taxon order with and without the name changes. n.unaltered_taxlabels = Original names in order with duplicates n.original_taxon_order = Modified names in order, suitable as keys to n.matrix I'll update Bio.SeqIO / Bio.AlignIO to take advantage of this shortly, storing the original name and the modified unique name as the SeqRecord's name and id properties. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 13:52:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 09:52:08 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807031352.m63Dq8el021720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #17 from fkauff at biologie.uni-kl.de 2008-07-03 09:52 EST ------- Hi Peter, I'd strongly suggest to use self.taxlabels instead of self.original_taxon_order. The latter is only for compatibility, and original_taxon_order just maps taxlabels. Actually it might make sense to give a deprecation warning if original_taxon_order is used, and it should be removed in some future release. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 14:06:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 10:06:46 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807031406.m63E6kct023377@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #584 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:06 EST ------- (From update of attachment 584) With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an empty Nexus object and add sequences to it. This code it now obsolete. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 14:13:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 10:13:38 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807031413.m63EDcGj024034@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:13 EST ------- Created an attachment (id=965) --> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) Bio/Nexus/Nexus.py handle support in write_nexus_data() With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an empty Nexus object and add sequences to it: #Read in an alignment object, e.g. with Bio.AlignIO from Bio import AlignIO alignment = AlignIO.read(open("example.aln"), "clustal") #Make a Nexus object from Bio.Nexus import Nexus handle = open("test.nex", "w") n = Nexus.Nexus() n.alphabet = alignment._alphabet for record in alignment : n.add_sequence(record.id, record.seq.tostring()) n.write_nexus_data(handle) handle.close() There are two problems with write_nexus_data(), firstly it doesn't accept a StringIO handle (see also Bug 2454). Secondly, if given a handle it closes it. This would break the above code, or how I typically use StringIO. This patch addresses these points. Frank, are you happy for me to commit this change? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 16:02:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 12:02:30 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807031602.m63G2Unc032671@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:02 EST ------- Created an attachment (id=966) --> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view) Patch to Bio/AlignIO/NexusIO.py adding write support This patch requires the Bio.Nexus handle fix (patch in attachment 965, comment 4). My method for constructing an empty DNA, RNA, or Protein Nexus object is perhaps inelegant. This is required in order to setup the alphabet, ambiguous_values and unambiguous_letters properties which otherwise default to DNA. Also note that the Nexus add_sequence() method does not seem to support duplicated taxa names. Perhaps this method could update the unaltered_taxlabels property and use the _unique_label method to cope with duplicate names? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 16:08:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 12:08:26 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807031608.m63G8QS3000534@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:08 EST ------- I have changed my use of original_taxon_order to just taxlabels (code now in Bio/AlignIO/NexusIO.py rather than Bio/SeqIO/NexusIO.py). I agree, adding a deprecation warning to the original_taxon_order get/set functions would make sense. P.S. Thanks for adding the unaltered_taxlabels property. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Jul 4 08:11:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Jul 2008 09:11:06 +0100 Subject: [Biopython-dev] What happened to Biopython 1.46? Message-ID: <320fb6e00807040111h182411d5lea14575f2906e7ba@mail.gmail.com> We were recently talking about doing another release, but as you may have noticed nothing has been announced. Michiel devoted a good chunk of his weekend to preparing Biopython 1.46 and uploaded it to the servers on Sunday 29th. He didn't issue an announcement email at the time due to the problem with the wiki being read only (now fixed). However, on the Monday evening I realised I'd done something really stupid in Bio.Data.CodonTable just before the CVS freeze. Table 15 (Blepharisma Macronuclear) would be used whenever a translation table was requested by name. This change has been reverted, and I've added further translation checks in test_seq.py to avoid any similar issue in future. So, while there is a Biopython 1.46, we're not going to advertise it because the translation functionality is subtly wrong. However, it is up on the website, and linked to with an errata statement. Michiel will kindly try and prepare Biopython 1.47 soon... so please hold off any big changes in CVS until then. And I'm hearby publicly promising to treat him to dinner - hopefully we'll be in the same country at the same time this year! Peter From bugzilla-daemon at portal.open-bio.org Fri Jul 4 08:39:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 04:39:35 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040839.m648dZnX025882@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #6 from fkauff at biologie.uni-kl.de 2008-07-04 04:39 EST ------- (In reply to comment #4) > Created an attachment (id=965) --> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details] > Bio/Nexus/Nexus.py handle support in write_nexus_data() > > With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an > empty Nexus object and add sequences to it: > ... > There are two problems with write_nexus_data(), firstly it doesn't accept a > StringIO handle (see also Bug 2454). > > Secondly, if given a handle it closes it. This would break the above code, or > how I typically use StringIO. > > This patch addresses these points. > > Frank, are you happy for me to commit this change? > Very nice. Go for it :-) Cheers, Frank (In reply to comment #4) > Created an attachment (id=965) --> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details] > Bio/Nexus/Nexus.py handle support in write_nexus_data() > > With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an > empty Nexus object and add sequences to it: > > #Read in an alignment object, e.g. with Bio.AlignIO > from Bio import AlignIO > alignment = AlignIO.read(open("example.aln"), "clustal") > > #Make a Nexus object > from Bio.Nexus import Nexus > handle = open("test.nex", "w") > n = Nexus.Nexus() > n.alphabet = alignment._alphabet > for record in alignment : > n.add_sequence(record.id, record.seq.tostring()) > n.write_nexus_data(handle) > handle.close() > > There are two problems with write_nexus_data(), firstly it doesn't accept a > StringIO handle (see also Bug 2454). > > Secondly, if given a handle it closes it. This would break the above code, or > how I typically use StringIO. > > This patch addresses these points. > > Frank, are you happy for me to commit this change? > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 08:53:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 04:53:10 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040853.m648rAHL026783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #965 is|0 |1 obsolete| | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:53 EST ------- (From update of attachment 965) > > This patch addresses these points. > > > > Frank, are you happy for me to commit this change? > > > > Very nice. Go for it :-) > Thanks Frank. Checking in Nexus.py; /home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py new revision: 1.17; previous revision: 1.16 done Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 08:56:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 04:56:10 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040856.m648uAAG027012@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #966 is|0 |1 obsolete| | ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:56 EST ------- (From update of attachment 966) There is slight problem with this patch on the alphabet selection (it uses "dna" when it should use "rna"). I postpone dealing with writing Nexus files in Bio.SeqIO / Bio.AlignIO until after the next Biopython release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 09:13:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 05:13:25 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040913.m649DPap027929@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #9 from fkauff at biologie.uni-kl.de 2008-07-04 05:13 EST ------- (In reply to comment #5) > Created an attachment (id=966) --> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view) [details] > Patch to Bio/AlignIO/NexusIO.py adding write support > > This patch requires the Bio.Nexus handle fix (patch in attachment 965 [details], comment > 4). > > My method for constructing an empty DNA, RNA, or Protein Nexus object is > perhaps inelegant. This is required in order to setup the alphabet, > ambiguous_values and unambiguous_letters properties which otherwise default to > DNA. > > Also note that the Nexus add_sequence() method does not seem to support > duplicated taxa names. Perhaps this method could update the > unaltered_taxlabels property and use the _unique_label method to cope with > duplicate names? > Ok, I updated add_sequence and will commit the changes soon. F -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 09:20:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 05:20:07 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040920.m649K7MI028352@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #10 from fkauff at biologie.uni-kl.de 2008-07-04 05:20 EST ------- (In reply to comment #9) > > > > Also note that the Nexus add_sequence() method does not seem to support > > duplicated taxa names. Perhaps this method could update the > > unaltered_taxlabels property and use the _unique_label method to cope with > > duplicate names? > > > Ok, I updated add_sequence and will commit the changes soon. > Checking in biopython/Bio/Nexus/Nexus.py; /home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py new revision: 1.18; previous revision: 1.17 Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Fri Jul 4 10:24:06 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 4 Jul 2008 03:24:06 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com> Message-ID: <36286.77119.qm@web62412.mail.re1.yahoo.com> > I'm assuming we'd put the new IntelliGenetics to > SeqRecord parser in Bio/SeqIO/IgIO.py (based on > the format name of "ig" used in EMBOSS). OK. > Would we then also deprecate Bio.IntelliGenetics? Yes. Otherwise, it's replicated functionality. > Do you want to make these changes, or should I? Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead. If you prefer me to do it, please let me know. > > Then, let's replace the IntelliGenetics tests by > files that do not contain the double > > semi-colon comments. > > Why not just leave the double colon lines alone? The parser > should be able to cope. In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but if we'd include them with the sequence-specific comments we'd be misrepresenting the file. --Michiel. --- On Wed, 7/2/08, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio.IntelliGenetics > To: mjldehoon at yahoo.com > Date: Wednesday, July 2, 2008, 12:11 PM > > It may be that the link in > Bio/IntelliGenetics/__init__.py actually does not pertain > to > > the IntelliGenetics format. Except for this link > (which as you point out actually talks > > about the MASE format, not the IntelliGenetics > format), I have seen no description > > elsewhere of these file-wide comments preceded by a > double semi-colon in the > > IntelliGenetics format. Even Biopython doesn't > treat these consistently: The tests > > for Bio.IntelliGenetics include comments with the > double semi-colon, but the parser > > doesn't treat them differently from > sequence-specific comments. > > Maybe we should ask BioPerl if they distinguish between the > IntelliGenetics and MASE formats? > > Looking back over the old mailing list, at the time they > did think the > two were the same: > http://lists.open-bio.org/pipermail/biopython-dev/2001-October/000626.html > > > So let's do the following: > > For the IntelliGenetics parser, do not look for double > semi-colon comments. Only check > > if the first character in a line is a semi-colon, and > if so, treat it as a sequence-specific > > comment. This is what Bio.IntelliGenetics currently > does anyway. > > Replace the parser class in Bio.IntelliGenetics by a > generator function, and integrate it with > > Bio.SeqIO. > > I'm assuming we'd put the new IntelliGenetics to > SeqRecord parser in > Bio/SeqIO/IgIO.py (based on the format name of > "ig" used in EMBOSS). > Would we then also deprecate Bio.IntelliGenetics? > > Do you want to make these changes, or should I? > > > Then, let's replace the IntelliGenetics tests by > files that do not contain the double > > semi-colon comments. > > Why not just leave the double colon lines alone? The parser > should be > able to cope. > > Peter From biopython at maubp.freeserve.co.uk Fri Jul 4 14:31:55 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Jul 2008 15:31:55 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <36286.77119.qm@web62412.mail.re1.yahoo.com> References: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com> <36286.77119.qm@web62412.mail.re1.yahoo.com> Message-ID: <320fb6e00807040731h787c66e6t10a4edd31dffdbc2@mail.gmail.com> >> Do you want to make these changes, or should I? > > Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead. OK. I've added a simple parser to CVS as Bio/SeqIO/IgIO.py for IntelliGenetics/MASE files using the format name "ig" to match EMBOSS. The existing three sample files are now being used in test_SeqIO.py and one of them also in test_AlignIO.py as well. If anyone wants to scan over the code, I'd be delighted to have feedback. Adding support for writing these files should be easy. Do you think this is worth implementing? Before we deprecate Bio.IntelliGenetics I suggest we ask on the mailing list if anyone is using it. > In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation > from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but > if we'd include them with the sequence-specific comments we'd be misrepresenting the file. I am ignoring the ";;" lines at the start of the file. Peter From mjldehoon at yahoo.com Sat Jul 5 08:24:41 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Jul 2008 01:24:41 -0700 (PDT) Subject: [Biopython-dev] CVS freeze for release 1.47 Message-ID: <223850.14172.qm@web62404.mail.re1.yahoo.com> Hi everybody, I'll start on release 1.47 from now, so please don't make any commits to CVS until the release is out. Thanks! --Michiel. From mjldehoon at yahoo.com Sun Jul 6 00:00:17 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Jul 2008 17:00:17 -0700 (PDT) Subject: [Biopython-dev] Biopython release 1.47 Message-ID: <287726.364.qm@web62412.mail.re1.yahoo.com> We are pleased to announce the release of Biopython 1.47. This release includes a new Bio.AlignIO module, updates to Bio.Blast, parsers for NCBI's Entrez E-Utilities, numerous other code improvements and fixes, and an extended and updated documentation. In particular if you use Biopython to access NCBI's E-Utilities, we encourage you to download and install this release to ensure full compliance with NCBI's access rules. Source distributions and Windows installers are available from the Biopython website at http://biopython.org. My thanks to all code contributers who made this new release possible. --Michiel on behalf of the Biopython developers From sbassi at gmail.com Sun Jul 6 19:53:54 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Sun, 6 Jul 2008 16:53:54 -0300 Subject: [Biopython-dev] NCBIStandalon not compatible with previous versions, is this a bug? Message-ID: NCBIStandalone changed in 1.46 due to bug #2508. So this code that was working before, no longer works: result, err = NCBIStandalone.blastall(b_exe, "blastn", b_db, f_name, expectation=1e-10, descriptions=1) The error trace is: File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py", line 1986, in _security_check_parameters if ";" in value or "&&" in value : TypeError: argument of type 'float' is not iterable So I had to rewrite the code as: result, err = NCBIStandalone.blastall(b_exe, "blastn", b_db, f_name, expectation="1e-10", descriptions="1") The problem is the function "_security_check_parameters", that assumes that all arguments are strings. Proposed solutions: 1) Leave it as is (this is not a bug). Some tutorial should be changed (?) 2) Modify line 1986 from: if ";" in value or "&&" in value : To this: if ";" in value or "&&" in str(value) : From bugzilla-daemon at portal.open-bio.org Mon Jul 7 10:47:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 06:47:48 -0400 Subject: [Biopython-dev] [Bug 2447] EUtils cannot parse PubMed XML for ACS journals In-Reply-To: Message-ID: <200807071047.m67Almjb027271@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2447 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:47 EST ------- Using Biopython release 1.47; Bio.Entrez can parse the XML for this PMID: >>> from Bio import Entrez >>> PMID = "17238260" >>> handle = Entrez.efetch(db='pubmed', id=PMID, retmode='xml') >>> record = Entrez.read(handle) >>> Noel, can you use Bio.Entrez instead of Bio.EUtils? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 7 10:55:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 06:55:10 -0400 Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author names In-Reply-To: Message-ID: <200807071055.m67AtAWu027543@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2448 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:55 EST ------- Using Bio.Entrez in Biopython release 1.47: >>> from Bio import Entrez >>> handle = Entrez.efetch(db='pubmed', id=pmids, retmode='xml') >>> records = Entrez.read(handle) >>> records[0]['MedlineCitation']['Article']['AuthorList'] [{u'LastName': 'Matamala', u'Initials': 'AR', u'ForeName': 'Adelio R'}, {u'LastName': 'Almonacid', u'Initials': 'DE', u'ForeName': 'Daniel E'}, {u'LastName': 'Figueroa', u'Initials': 'MF', u'ForeName': 'Maximiliano F'}, {u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName': u'Jos\xe9'}, {u'LastName': 'Bunster', u'Initials': 'MC', u'ForeName': 'Marta C'}] Noel, is this sufficient for your needs? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 7 11:12:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 07:12:26 -0400 Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author names In-Reply-To: Message-ID: <200807071112.m67BCQAB028433@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2448 ------- Comment #3 from baoilleach at gmail.com 2008-07-07 07:12 EST ------- Thanks Michiel, but I found a workaround a day later so don't worry about me. I was just letting you know about the bug... Noel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Jul 7 13:07:24 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Jul 2008 14:07:24 +0100 Subject: [Biopython-dev] NCBIStandalon not compatible with previous versions, is this a bug? In-Reply-To: References: Message-ID: <320fb6e00807070607m2cee88b1n9b2b2194d96c3c12@mail.gmail.com> On Sun, Jul 6, 2008 at 8:53 PM, Sebastian Bassi wrote: > NCBIStandalone changed in 1.46 due to bug #2508. > So this code that was working before, no longer works: > > result, err = NCBIStandalone.blastall(b_exe, "blastn", > b_db, f_name, expectation=1e-10, descriptions=1) > > The error trace is: > > File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py", > line 1986, in _security_check_parameters > if ";" in value or "&&" in value : > TypeError: argument of type 'float' is not iterable > > So I had to rewrite the code as: > > result, err = NCBIStandalone.blastall(b_exe, "blastn", > b_db, f_name, expectation="1e-10", descriptions="1") > > The problem is the function "_security_check_parameters", that assumes > that all arguments are strings. > > Proposed solutions: > > 1) Leave it as is (this is not a bug). Some tutorial should be changed (?) > 2) Modify line 1986 from: > if ";" in value or "&&" in value : > To this: > if ";" in value or "&&" in str(value) : I would say its a bug, and casting into a string on line 1986 looks like the best fix. I won't be able to do this until tomorrow afternoon at the latest - if you could file a bug that would be helpful in case I forget ;) Thanks Peter From bugzilla-daemon at portal.open-bio.org Mon Jul 7 17:08:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 13:08:40 -0400 Subject: [Biopython-dev] [Bug 2538] New: _security_check_parameters assumes all arguments are strings Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2538 Summary: _security_check_parameters assumes all arguments are strings Product: Biopython Version: 1.46 Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com This code no longer works: result, err = NCBIStandalone.blastall(b_exe, "blastn", b_db, f_name, expectation=1e-10, descriptions=1) Because new _security_check_parameters function assumes all blastall parameters are string, but expectation and descriptions are float and int. Proposed fix: Modify line 1986 from: if ";" in value or "&&" in value : To this: if ";" in value or "&&" in str(value) : -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at gmail.com Mon Jul 7 20:30:14 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 7 Jul 2008 17:30:14 -0300 Subject: [Biopython-dev] Alignment problem. bug? Message-ID: I would like to confirm that this is a bug ot not. If I get confirmation, I would fill it in bugzilla. With this code: from Bio import Clustalw from Bio.Clustalw import MultipleAlignCL cline = MultipleAlignCL('foralig.txt') cline.set_output("alig.aln") alignment = Clustalw.do_alignment(cline) I get: Traceback (most recent call last): File "/mnt/hda2/py252/bin/ii.py", line 112, in alignment = Clustalw.do_alignment(cline) File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 125, in do_alignment return parse_file(out_file, alphabet) File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 47, in parse_file generic_alignment = AlignIO.read(handle, "clustal") File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py", line 299, in read first = iterator.next() File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py", line 169, in next raise ValueError("Could not parse line:\n%s" % line) ValueError: Could not parse line: I tested with Biopython 1.47 and 1.46 with the input file: http://www.pastecode.com.ar/f44f28b41 (download at http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41) The clustal program is running because I see in the disk its output (posted here: http://www.pastecode.com.ar/f275a5475). It seems it fails to parse it. I also tested in an older version (I guess it is 1.44) and it works OK. So I think the problem was introduced in 1.46. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From bugzilla-daemon at portal.open-bio.org Tue Jul 8 08:41:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Jul 2008 04:41:02 -0400 Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all arguments are strings In-Reply-To: Message-ID: <200807080841.m688f2VL020100@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2538 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:41 EST ------- Included a float in the unit test for _security_check_parameters() added in Bug 2508: Tests/test_NCBIStandalone.py revision: 1.15; Fixed the string assumption in: Bio/Blast/NCBIStandalone.py revision: 1.74; Note that in your suggested fix Sebastian, both the "in" expressions need casting to a string. Thanks for reporting this! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Jul 8 08:51:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 09:51:31 +0100 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: References: Message-ID: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote: > I would like to confirm that this is a bug ot not. If I get > confirmation, I would fill it in bugzilla. It does look like a bug to me... > With this code: > > from Bio import Clustalw > from Bio.Clustalw import MultipleAlignCL > > cline = MultipleAlignCL('foralig.txt') > cline.set_output("alig.aln") > alignment = Clustalw.do_alignment(cline) > > I get: > > Traceback (most recent call last): > File "/mnt/hda2/py252/bin/ii.py", line 112, in > alignment = Clustalw.do_alignment(cline) > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", > line 125, in do_alignment > return parse_file(out_file, alphabet) > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", > line 47, in parse_file > generic_alignment = AlignIO.read(handle, "clustal") > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py", > line 299, in read > first = iterator.next() > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py", > line 169, in next > raise ValueError("Could not parse line:\n%s" % line) > ValueError: Could not parse line: > > > I tested with Biopython 1.47 and 1.46 with the input file: > http://www.pastecode.com.ar/f44f28b41 (download at > http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41) > The clustal program is running because I see in the disk its output > (posted here: http://www.pastecode.com.ar/f275a5475). It seems it > fails to parse it. > > I also tested in an older version (I guess it is 1.44) and it works > OK. So I think the problem was introduced in 1.46. For Biopython 1.46+ I switched the Bio.Clustalw parser to internally call Bio.AlignIO, so one thing you could try is reverting Bio/Clustalw/__init__.py to the older version (e.g. that shipped with Biopython 1.45). You haven't said which version of the ClustalW tool you are using - maybe 2.0? If so, there could be a subtle change in the output format since 1.83. If you could run the tool by hand and share the output that would be helpful to try and track down this issue. I don't seem to have any version of ClustalW installed on my current machine, so it will take me a little longer to reproduce this here. Could you file a bug please, and attach the example input and the output when run by hand at the command line. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Tue Jul 8 08:52:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Jul 2008 04:52:06 -0400 Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all arguments are strings In-Reply-To: Message-ID: <200807080852.m688q6Ce020588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2538 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:52 EST ------- Forgot to mark this as fixed - sorry for the extra email! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Jul 8 11:02:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 12:02:37 +0100 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> Message-ID: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> On Tue, Jul 8, 2008 at 9:51 AM, Peter wrote: > On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote: >> I would like to confirm that this is a bug ot not. If I get >> confirmation, I would fill it in bugzilla. > > It does look like a bug to me... I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal files where the first line of the consensus was blank (and would probably affect both Clustal W 1.83 too). I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 Could you update this file and re-test please Sebastian? Also, may I add a test alignment file based on your example to CVS please? Thanks, Peter From mjldehoon at yahoo.com Tue Jul 8 12:47:48 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 8 Jul 2008 05:47:48 -0700 (PDT) Subject: [Biopython-dev] Bio.Sequencing Message-ID: <570915.67657.qm@web62415.mail.re1.yahoo.com> Hi everybody, Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps? I'd like to make some changes to Bio.Sequencing with regards to bug #2454: http://bugzilla.open-bio.org/show_bug.cgi?id=2454 Just to make sure that I am not treading on other people's work. --Michiel From fkauff at biologie.uni-kl.de Tue Jul 8 13:12:39 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Tue, 08 Jul 2008 15:12:39 +0200 Subject: [Biopython-dev] Bio.Sequencing In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com> References: <570915.67657.qm@web62415.mail.re1.yahoo.com> Message-ID: <487367C7.2050702@biologie.uni-kl.de> Hi all, Michiel de Hoon wrote: > Hi everybody, > > Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps? > Not me. Green lights from my side. Frank > I'd like to make some changes to Bio.Sequencing with regards to bug #2454: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2454 > > Just to make sure that I am not treading on other people's work. > > > --Michiel > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From biopython at maubp.freeserve.co.uk Tue Jul 8 14:36:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 15:36:43 +0100 Subject: [Biopython-dev] Bio.Sequencing In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com> References: <570915.67657.qm@web62415.mail.re1.yahoo.com> Message-ID: <320fb6e00807080736v26388f2ake12303c5b752c5e9@mail.gmail.com> On Tue, Jul 8, 2008 at 1:47 PM, Michiel de Hoon wrote: > Hi everybody, > > Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps? > I'd like to make some changes to Bio.Sequencing with regards to bug #2454: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2454 > > Just to make sure that I am not treading on other people's work. My only comment is watch out for the fact that Bio.SeqIO is now calling Bio.Sequencing for the "ace" and "phd" formats. On a related note, I'd had some ideas for making the Ace parser more user friendly by further extending the doc strings and defining __str__ or __repr__ methods for some of the "line type classes" which otherwise must be explored by using dir() to discover the properties. I haven't actually done any work on this yet though. Peter From sbassi at gmail.com Tue Jul 8 15:38:29 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Jul 2008 12:38:29 -0300 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> Message-ID: On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote: > I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with > Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal > files where the first line of the consensus was blank (and would > probably affect both Clustal W 1.83 too). Yes, I used ClustalW 1.83 > I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 > Could you update this file and re-test please Sebastian? Also, may I > add a test alignment file based on your example to CVS please? Ok, I will test it today. You can use my file or any possible derivation of it. Best, SB. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From biopython at maubp.freeserve.co.uk Tue Jul 8 15:56:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 16:56:20 +0100 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> Message-ID: <320fb6e00807080856s55d77962h9ceedd160ca8002b@mail.gmail.com> >> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 >> Could you update this file and re-test please Sebastian? Also, may I >> add a test alignment file based on your example to CVS please? > > Ok, I will test it today. You can use my file or any possible derivation of it. Thanks - I have added a two sequence version of your example as Tests/Clustalw/odd_consensus.aln Peter From sbassi at gmail.com Tue Jul 8 16:52:13 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Jul 2008 13:52:13 -0300 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> Message-ID: On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote: > I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 Just to confirm that it works now. Thank you! Best, SB. From biopython at maubp.freeserve.co.uk Wed Jul 9 11:11:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Jul 2008 12:11:16 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO In-Reply-To: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com> References: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com> Message-ID: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com> Now that Biopython 1.47 is out, are there any comments/objections to my committing this to CVS? Bug 2533 - Support for simple "tab" format in Bio.SeqIO http://bugzilla.open-bio.org/show_bug.cgi?id=2533 Thanks, Peter P.S. Any real world example files would be good for the test suite. From lpritc at scri.ac.uk Wed Jul 9 12:14:04 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 09 Jul 2008 13:14:04 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO In-Reply-To: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com> Message-ID: Only that you might want to consider Axon Text File format as a self-describing tab-separated format which would facilitate storage and recovery of all attributes of a sequence. There's a spec here: http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html On 09/07/2008 12:11, "Peter" wrote: > Now that Biopython 1.47 is out, are there any comments/objections to > my committing this to CVS? > > Bug 2533 - Support for simple "tab" format in Bio.SeqIO > http://bugzilla.open-bio.org/show_bug.cgi?id=2533 > > Thanks, > > Peter > > P.S. Any real world example files would be good for the test suite. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Dr Leighton Pritchard B.Sc.(Hons) MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). From biopython at maubp.freeserve.co.uk Wed Jul 9 12:30:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Jul 2008 13:30:26 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO In-Reply-To: References: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com> Message-ID: <320fb6e00807090530j43a3e2c9y48bef4993587881f@mail.gmail.com> On Wed, Jul 9, 2008 at 1:14 PM, Leighton Pritchard wrote: > Only that you might want to consider Axon Text File format as a > self-describing tab-separated format which would facilitate storage and > recovery of all attributes of a sequence. There's a spec here: > > http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html > Its an interesting and flexible file format, but I don't see any standard column name for "sequence" which would be of particular interest from the point of view of the Bio.SeqIO module. If there is a de-facto convention for storing sequence data in an Axon Text File, then we could adopt this within Bio.SeqIO. Otherwise, I think any Axon Text File parser added to Biopython would have to be of much more general nature (and not part of Bio.SeqIO). Peter From biopython at maubp.freeserve.co.uk Wed Jul 9 13:03:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Jul 2008 14:03:16 +0100 Subject: [Biopython-dev] Simple __getitem__ for Alignments Message-ID: <320fb6e00807090603o6b087ceeuce0b87c13627552a@mail.gmail.com> Now that the latest release is out (Biopython 1.47), Bio.AlignIO should start to get used more. I anticipate more people getting frustrated with the current Alignment object, and would like to make another baby-step in improving it. I'd like to add a minimal __getitem__ method, as described in Bug 1944 comment 15, http://bugzilla.open-bio.org/show_bug.cgi?id=1944#c15 > def __getitem__(self, index) : > """Access part of the alignment. > > You can access a row of the alignment as a SeqRecord using an integer > index (think of the alignment as a list of SeqRecord objects here): > > first_record = my_alignment[0] > last_record = my_alignment[-1] > > Right now, this is the ONLY indexing operation supported. The > use of two indices and splice notation to extract a sub-alignment, > row, column or letter is under discussion for a future update.""" > if isinstance(index, int) : > #e.g. result = align[x] > #Return a SeqRecord > return self._records[index] > else : > raise TypeError, "Not currently supported, but may be in future." >From the discussion on Bug 1944, this doesn't seem to be contentious - the debate is about more advanced splicing operations. I'd also like to add a __len__ method which would return the number of SeqRecord objects (i.e. the number of rows). This would then let the alignment be treated very much like a read-only list of SeqRecord objects. Remember, we can already iterate over the rows in the alignment as SeqRecord objects. Any comments? Peter From bugzilla-daemon at portal.open-bio.org Wed Jul 9 13:21:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 09:21:13 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091321.m69DLD9g031282@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 09:21 EST ------- (In reply to comment #16) I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free to have a look and comment. If everybody is OK, I'll add a DeprecationWarning to the previous parser. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 13:37:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 09:37:44 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091337.m69DbiM5031944@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #21 from fkauff at biologie.uni-kl.de 2008-07-09 09:37 EST ------- Michiel, while you're at it - could you update my email in the source as well? And Cymon's email is now cy at cymon.org. Thanks! Frank (In reply to comment #20) > (In reply to comment #16) > I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free > to have a look and comment. If everybody is OK, I'll add a DeprecationWarning > to the previous parser. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 13:38:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 09:38:18 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091338.m69DcIDu031986@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 09:38 EST ------- In reply to comment 20 about the updates to Bio.Sequencing.PhD I see you've also update Bio.SeqIO.PhdIO in CVS (good). I would suggest you add yourself to the copyright statement for this module, and add some doc string entries to the new read and parse functions. I haven't looked over the details of the new code (other than confirming test_Phd.py and test_SeqIO.py seem happy). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:28:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 10:28:36 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091428.m69ESaGm001621@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #23 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 10:28 EST ------- (In reply to comment #21) > Michiel, > > while you're at it - could you update my email in the source as well? And > Cymon's email is now I have updated your address, but I'd prefer hold off on Cymon's without his direct permission -- spammers are watching too, you know. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 18:33:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 14:33:42 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807091833.m69IXgcV013783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:33 EST ------- OK, so my old code not yet converted to biopython-1.47 gives me: _textframe = blast.blast_and_htmlize(_query_sequence, _usermode, upload_temp_path, blast_path, uri, _align_view, _matrix) File "/home/mmokrejs/public_html/IRES2/blast.py", line 548, in blast_and_htmlize _blast_out, _error_info, _blast_file = blastall(blast_path + targetdb, query_sequence, upload_temp_path, mode='sequence', align_view=align_view, matrix=matrix) File "/home/mmokrejs/public_html/IRES2/blast.py", line 506, in blastall _blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall', 'blastn', blast_db, _blast_file, matrix=matrix + ' -F 0', wordsize=_wordsize, gap_open=_gap_open, gap_extend=_gap_extend, strands=_strands, alignments=_alignments, descriptions=_descriptions, expectation=_expectation, align_view=align_view) File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1620, in blastall _security_check_parameters(keywds) File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1986, in _security_check_parameters if ";" in value or "&&" in value : TypeError: argument of type 'int' is not iterable It turns out I am passing in: {'matrix': 'NUC.4.4 -F 0', 'strands': 3, 'expectation': 100, 'wordsize': 4, 'gap_extend': 1, 'gap_open': 1, 'alignments': 99999, 'descriptions': 9999} I don't think it makes sense to require users to pass strings instead of numbers to the function. While looking into the _security_check_parameters() I think you should also check for "||" - the logical OR as interpreted by shell and redirections ">" and "<". FIX: -if ";" in value or "&&" in value: +if ";" in str(value) or "&&" in str(value) or "||" in str(value) or ">" in str(value) or "<" in str(value): My apologies that I did not test earlier. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 18:38:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 14:38:08 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807091838.m69Ic82k014070@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 ------- Comment #11 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:38 EST ------- Don't know if you want to leave in the back-door to pass in another argument with its value. If not, prevent spaces as well. Values never contain spaces unless wrapped by single or double-quotes. I find it perfectly legal to tell blastall: -d "/some/db /another/db /yet/another" to search over three databases at once. It seems it does not reflect -d specified 3 times on its command-line. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 20:12:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 16:12:40 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807092012.m69KCeO2018087@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 16:12 EST ------- The issue with non-string arguments (e.g. float or integers) was reported by by Sebastian Bassi (Bug 2538) and has since been fixed in CVS - sadly this was after the release of Biopython 1.47. As you've demonstrated there are valid reasons to want to include spaces. I would rather not add a check which requires lots of special casing. I'm leaving this bug open to consider extending _security_check_parameters() to prevent the use of pipes and redirection (i.e. "|", "<" and ">") which sounds reasonable. A third opinion wouldn't hurt of course! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 10 10:30:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Jul 2008 06:30:28 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807101030.m6AAUSew025300@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #24 from fkauff at biologie.uni-kl.de 2008-07-10 06:30 EST ------- > (In reply to comment #21) > > Michiel, > > > > while you're at it - could you update my email in the source as well? And > > Cymon's email is now > > I have updated your address, but I'd prefer hold off on Cymon's without his > direct permission -- spammers are watching too, you know. > Contacted Cymon, reply below: Hi Frank, ... > > Do you want your email address updated in the ace/phd parser code? Or > removed (just the email, not the name, of course)? Don't know if you follow > biopython-dev lately. I dont actually follow the -dev list but perhaps I should, as I think I'm going to be using and doing far more diverse bioinformatics stuff (now that I'm employed as a bioinformatician :) Anyway, the email can be changed to cymon.cox at gmail.com - best to go through google I think as their spam filters tend to be pretty good. Cheers, C. (In reply to comment #23) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 10 16:24:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Jul 2008 12:24:27 -0400 Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in Bio.SeqIO In-Reply-To: Message-ID: <200807101624.m6AGORlL012526@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2533 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-10 12:24 EST ------- Checked in, marking as fixed. Bio/SeqIO/TabIO.py initial revision: 1.1 Bio/SeqIO/__init__.py new revision: 1.33 Tests/output/test_SeqIO new revision: 1.25 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 03:20:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Jul 2008 23:20:11 -0400 Subject: [Biopython-dev] [Bug 2542] New: AlignInfo.py fails a test Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2542 Summary: AlignInfo.py fails a test Product: Biopython Version: 1.46 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com When I run: $ python2.5 /mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py I get the first 2 test OK but then: Traceback (most recent call last): File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 723, in print summary.information_content() File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 508, in information_content raise ValueError, errstr ValueError: Error in alphabet: not Nucleotide or Protein, supply expected frequencies I've also tried without the AlignIO: from Bio import Alphabet from Bio.Align.Generic import Alignment from Bio.Seq import Seq from Bio.Align.AlignInfo import SummaryInfo seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW' seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW' a = Alignment(Alphabet.ProteinAlphabet) a.add_sequence("asp",seq1) a.add_sequence("unk",seq2) summary = SummaryInfo(a) summary.information_content() Traceback (most recent call last): File "/mnt/hda2/py252/bin/align.py", line 16, in summary.information_content() File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py", line 508, in information_content raise ValueError, errstr ValueError: Error in alphabet: not Nucleotide or Protein, supply expected frequencies -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 08:49:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 04:49:08 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807110849.m6B8n8Xg022720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 04:49 EST ------- Going over your example code: >>> from Bio import Alphabet >>> from Bio.Align.Generic import Alignment >>> from Bio.Align.AlignInfo import SummaryInfo >>> seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW' >>> seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW' >>> a = Alignment(Alphabet.ProteinAlphabet) First problem, you gave the Alignment object an Alphabet class, rather than an instance of the class. I guess we should an explicit check to the Alignment object... You should have used: >>> a = Alignment(Alphabet.ProteinAlphabet()) Or, if you prefer perhaps: >>> a = Alignment(Alphabet.generic_protein) Then when we get to the information_content, there is another issue: >>> a.add_sequence("asp",seq1) >>> a.add_sequence("unk",seq2) >>> summary = SummaryInfo(a) >>> summary.information_content() Traceback (most recent call last): ... AttributeError: ProteinAlphabet instance has no attribute 'gap_char' The trouble here is that SummaryInfo class is looking for a declared gap character in the protein alphabet - and none has been declared. Your example sequences appear to use "-" as a gap, but you haven't declared this. Try this: from Bio import Alphabet from Bio.Align.Generic import Alignment from Bio.Seq import Seq from Bio.Align.AlignInfo import SummaryInfo seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW' seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW' a = Alignment(Alphabet.Gapped(Alphabet.generic_protein, "-")) a.add_sequence("asp",seq1) a.add_sequence("unk",seq2) summary = SummaryInfo(a) print summary.information_content() You mentioned having a similar issue with Bio.AlignIO - could you attached the example file to this bug with some trivial code showing your problem? Thanks, Peter. P.S. Please update to Biopython 1.47 rather than using 1.46 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 09:50:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 05:50:49 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807110950.m6B9on7t025902@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 05:50 EST ------- I think I've fixed the "Quick test" failure when running Bio/Align/AlignInfo.py directly. I don't know how I missed that before... /home/repository/biopython/biopython/Bio/Align/AlignInfo.py,v <-- AlignInfo.py new revision: 1.15; previous revision: 1.14 done My opinion from from looking at the AlignInfo code, and scanning back over the CVS history, is that it was ever used much with generic alphabets (as tend to be returned by Bio.AlignIO). There may be other issues here - for example I've spotted another problem case, doubly extended alphabets like a protein alphabet with declared Gapped and WithStopCodon (which you *might* want in an alignment). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Jul 11 10:33:22 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 11 Jul 2008 11:33:22 +0100 Subject: [Biopython-dev] Checking alphabet argument in alignments Message-ID: <320fb6e00807110333r1938510bne7e24d1ce7b5c0b@mail.gmail.com> I'd like to add the following check to the __init__ method of the Bio.Align.Generic.Alignment object (our base alignment class), > if not (isinstance(alphabet, Alphabet.Alphabet) \ > or isinstance(alphabet, Alphabet.AlphabetEncoder)): > raise ValueError("Invalid alphabet argument") This will prevent subtle user errors like this: from Bio import Alphabet from Bio.Align.Generic import Alignment a = Alignment(Alphabet.ProteinAlphabet) which should be: from Bio import Alphabet from Bio.Align.Generic import Alignment a = Alignment(Alphabet.ProteinAlphabet()) The only downside I have thought of is if anyone has created their own alignment class which does NOT subclass the original Bio.Alphabet.Alphabet class. This same test could (should?) also be added to the Seq and MutableSeq objects. What do people think? Peter From bugzilla-daemon at portal.open-bio.org Fri Jul 11 10:39:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 06:39:48 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807111039.m6BAdm05028072@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 06:39 EST ------- In comment 2 I wrote: > I've spotted another problem case, doubly extended alphabets like a > protein alphabet declared Gapped and WithStopCodon (which you *might* > want in an alignment). This alphabet issue is fixed in CVS, as is another corner case of a divide by zero error where an entire column consists of ignored characters. Please re-test with Bio/Align/AlignInfo.py revision 1.16 from CVS. Thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 16:18:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 12:18:28 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807111618.m6BGISQ3013553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #25 from mdehoon at ims.u-tokyo.ac.jp 2008-07-11 12:18 EST ------- (In reply to comment #24) OK, I updated Phd.py. The last module to consider is Ace.py; I'll upload a fixed version soon. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 21:00:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:00:10 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807112100.m6BL0Aer026629@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #4 from sbassi at gmail.com 2008-07-11 17:00 EST ------- (In reply to comment #1) > First problem, you gave the Alignment object an Alphabet class, rather than an > instance of the class. I guess we should an explicit check to the Alignment > object... Yes, that is my fault. > You mentioned having a similar issue with Bio.AlignIO - could you attached the > example file to this bug with some trivial code showing your problem? Yes, this code with Bio.AlignIO also fails (I tried right now with AlignInfo.py rev. 1.17): from Bio.Align import AlignInfo from Bio.Align.AlignInfo import SummaryInfo from Bio import AlignIO fn = open("secu3.aln") alignment = AlignIO.read(fn, "clustal") summary = SummaryInfo(alignment) print summary.information_content() And I got (and this time I am not supplying any alphabet, at least not explicit): Traceback (most recent call last): File "/mnt/hda2/py252/bin/2542.py", line 12, in print summary.information_content() File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py", line 499, in information_content raise ValueError, errstr ValueError: Error in alphabet: not Nucleotide or Protein, supply expected frequencies > P.S. Please update to Biopython 1.47 rather than using 1.46 I was using Biopython 1.47, but I reported as 1.46 just because 1.47 it is not available from the drop-down menu in bugzilla form. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 21:02:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:02:24 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807112102.m6BL2OvF026827@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #5 from sbassi at gmail.com 2008-07-11 17:02 EST ------- Created an attachment (id=971) --> (http://bugzilla.open-bio.org/attachment.cgi?id=971&action=view) This file is used by my example were information_content() fails when sequences retrieved with AlignIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 21:16:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:16:03 -0400 Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and Bio.AlignIO In-Reply-To: Message-ID: <200807112116.m6BLG3SJ027522@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2443 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Specifying the alphabet in |Specifying the alphabet in |Bio.SeqIO.parse() |Bio.SeqIO and Bio.AlignIO ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:16 EST ------- I'm broadening the scope of this enhancement bug to cover Bio.SeqIO and Bio.AlignIO (both their read() and parse() functions). See also alphabet issues raised on Bug 2542. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 21:19:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:19:50 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807112119.m6BLJoTt027660@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:19 EST ------- > Yes, this code with Bio.AlignIO also fails (I tried right now with > AlignInfo.py rev. 1.17): > > from Bio.Align import AlignInfo > from Bio.Align.AlignInfo import SummaryInfo > from Bio import AlignIO > fn = open("secu3.aln") > alignment = AlignIO.read(fn, "clustal") > summary = SummaryInfo(alignment) > print summary.information_content() > > And I got (and this time I am not supplying any alphabet, at least not > explicit): > > Traceback (most recent call last): > ... > ValueError: Error in alphabet: not Nucleotide or Protein, supply expected > frequencies Good. That seems to be working as intended - alignment formats like FASTA or Clustal do not specify the sequence type (unlike for example the Nexus format). Perhaps Bio.AlignIO.read() and parse() should be able to accept an optional alphabet argument? I had already been considering this for Bio.SeqIO so this is a natural extension. See Bug 2443. Unless information_content() can determine the sequence type (protein or nucleotide) from the alignment alphabet, you have to help it by supplying an appropriate e_freq_table argument. Perhaps: from Bio.Alphabet import IUPAC from Bio.SubsMat import FreqTable from Bio.Align.AlignInfo import SummaryInfo from Bio import AlignIO fn = open("secu3.aln") alignment = AlignIO.read(fn, "clustal") summary = SummaryInfo(alignment) #Have a generic alphabet, without a declared gap char, so must #provide the frequencies and chars to ignore explicitly: expected = FreqTable.FreqTable({"A":0.25,"G":0.25,"T":0.25,"C":0.25}, FreqTable.FREQ, IUPAC.unambiguous_dna) print summary.information_content(e_freq_table=expected, chars_to_ignore=['-']) This is probably safest. I'm doubtful that information_content() will choose wisely if given mixed case or lower case sequences... if that is the case it should be filed as a new bug. > > > P.S. Please update to Biopython 1.47 rather than using 1.46 > > I was using Biopython 1.47, but I reported as 1.46 just because 1.47 > it is not available from the drop-down menu in bugzilla form. Thanks for the reminder - I've added that to Bugzilla now :) I'm marking this bug as fixed now (after the updates to AlignInfo.py) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From peter at maubp.freeserve.co.uk Sat Jul 12 13:45:46 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Sat, 12 Jul 2008 14:45:46 +0100 Subject: [Biopython-dev] Deprecating the HTML parser in Bio.Blast.NCBIWWW Message-ID: <320fb6e00807120645u26321d71q30f72ed5808f700@mail.gmail.com> For some time now we've been discouraging the use of the HTML and plain text Blast parsers in favour of the XML format. I think it would be a good idea to now officially deprecate the HTML parser in Bio.Blast.NCBIWWW with warning messages when it is used. I don't even know if it still works with the recent big revision to the BLAST webpages, but I suspect not. However, the plain text parser in Bio.Blast.NCBIStandalone still has its uses. In particular, right now the PSI-BLAST output in XML format lacks some of the information found in the plain text output (new vs reused entries) so it would be premature to deprecate our plain text PSI parser. See Bug 2502 for details: http://bugzilla.open-bio.org/show_bug.cgi?id=2502#c18 Peter From bugzilla-daemon at portal.open-bio.org Sun Jul 13 16:23:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 13 Jul 2008 12:23:57 -0400 Subject: [Biopython-dev] [Bug 2543] New: Bio.Nexus.Trees can't handle named ancestors Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2543 Summary: Bio.Nexus.Trees can't handle named ancestors Product: Biopython Version: 1.46 Platform: PC OS/Version: FreeBSD Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: markd at soe.ucsc.edu The following code produces: ValueError: invalid literal for float(): Ancestor1 from Bio.Nexus import Trees # from http://evolution.genetics.washington.edu/phylip/newicktree.html treeStr = "(B:6.0,(A:5.0,C:3.0,E:4.0)Ancestor1:5.0,D:11.0);" tree = Trees.Tree(treeStr) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 14 10:17:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 14 Jul 2008 06:17:14 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200807141017.m6EAHEhg019686@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-14 06:17 EST ------- This sounds like a job for Frank (the Bio.Nexus module author). Can I ask if you've actually come across trees with names ancestor nodes in "real life"? That would make this bug more important. If so, the name of the tool would be interesting, an example tree file would be great to add to Biopython as a test case. If on the other hand the only named ancestor tree you've ever tried is the example from the Newick documentation, this doesn't seem such a high priority (but still worth fixing). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 15 20:07:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Jul 2008 16:07:56 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200807152007.m6FK7umn009526@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-15 16:07 EST ------- This is a suggested implementation of the split method for our Seq object, modelled after the python string method which it calls internall. Note that I have made the separator non-optional on the grounds that the string method's default of white space isn't (usually) sensible for sequences. I'm happy to change this if people this its better to be as close as possible to the string method. def split(self, sep, maxsplit=None) : """Split method, like that of a python string. Return a list of the 'words' in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. Unlike the python string method, sep must be specified (as there shouldn't be any whitespace strings in a sequence). e.g. print my_seq.split("-") """ if maxsplit : parts = self.data.split(sep, maxsplit) else : parts = self.data.split(sep) return [Seq(chunk, self.alphabet) for chunk in parts] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 09:39:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 05:39:01 -0400 Subject: [Biopython-dev] [Bug 2544] New: Bio.SeqIO improvements Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2544 Summary: Bio.SeqIO improvements Product: Biopython Version: 1.47 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mmokrejs at ribosome.natur.cuni.cz $ python Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24) [GCC 4.3.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import SeqIO >>> handle = open("genbank-synthetic.gb") >>> print seq_record ID: EF452680.2 Name: EF452680 Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds. /comment=On Feb 4, 2008 this sequence version replaced gi:145391444. /sequence_version=2 /source=synthetic construct /taxonomy=['other sequences', 'artificial sequences'] /keywords=[''] /references=[, , , ] /accessions=['EF452680'] /data_file_division=SYN /date=11-JUN-2008 /organism=synthetic construct /gi=166831528 Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC', IUPACAmbiguousDNA()) >>> I do not see how I could access the value 'DNA' from the LOCUS line: LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008 No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0]. Could seq_record.features have a repr() function to give me something useful instead of this? >>> print seq_record.features [, , ] >>> I don't see documented anywhere in the biopython docs access the features, pasting something like the following into docs would give a user clue where to look for for values: >>> print seq_record.features[0].qualifiers {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq: aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']} >>> print seq_record.features[1].qualifiers {'gene': ['NOS']} >>> print seq_record.features[2].qualifiers {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number': ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma gondii'], 'db_xref': ['GI:166831529'], 'translation': ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'], 'gene': ['NOS'], 'protein_id': ['ABP65329.2']} >>> print seq_record.features[3].qualifiers Traceback (most recent call last): File "", line 1, in IndexError: list index out of range >>> I wonder if I could access the above dicts as seq_record.features['source'] or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 10:30:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 06:30:21 -0400 Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements In-Reply-To: Message-ID: <200807161030.m6GAUL9x017920@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2544 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Bio.SeqIO improvements |Bio.GenBank and SeqFeature | |improvements ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 06:30 EST ------- (In reply to comment #0) > $ python > > Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24) > [GCC 4.3.1] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Bio import SeqIO > >>> handle = open("genbank-synthetic.gb") I'm guessing the missing line here was something like: seq_record = SeqIO.read(handle, "genbank") > >>> print seq_record > ID: EF452680.2 > Name: EF452680 > Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds. > /comment=On Feb 4, 2008 this sequence version replaced gi:145391444. > /sequence_version=2 > /source=synthetic construct > /taxonomy=['other sequences', 'artificial sequences'] > /keywords=[''] > /references=[, > , instance at 0x834ceac>, ] > /accessions=['EF452680'] > /data_file_division=SYN > /date=11-JUN-2008 > /organism=synthetic construct > /gi=166831528 > Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC', > IUPACAmbiguousDNA()) > >>> > > > I do not see how I could access the value 'DNA' from the LOCUS line: > LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008 Currently the sequence type (DNA, RNA, Protein) is used internally by the GenBank parser to determine the alphabet. It is not currently recorded in the SeqRecord object's annotation but could be. How about something like this?: seq_record.annotations["seq_type"] > No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0]. Assuming that the first feature is the source (typically the case), and assuming it has a specified molecule type, then your suggestion is one work around. But I agree, its not nice. > Could seq_record.features have a repr() function to give me something useful > instead of this? > > >>> print seq_record.features > [, instance at 0x837b9cc>, ] Yes we could add that, but you wouldn't want to do that on a typical genome with thousands of features. Adding a repr method for the Reference object is also something I had wondered about doing. > I don't see documented anywhere in the biopython docs access the features, > pasting something like the following into docs would give a user clue where to > look for for values: > > >>> print seq_record.features[0].qualifiers > {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic > construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq: > aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']} > >>> print seq_record.features[1].qualifiers > {'gene': ['NOS']} > >>> print seq_record.features[2].qualifiers > {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number': > ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma > gondii'], 'db_xref': ['GI:166831529'], 'translation': > ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'], > 'gene': ['NOS'], 'protein_id': ['ABP65329.2']} There is a minimal bit of text in what is currently Chapter 10 of the tutorial on the SeqFeature object. I agree, this is an area that needs improvement. Perhaps a full example of parsing a simple GenBank file in the SeqIO chapter would help? > >>> print seq_record.features[3].qualifiers > Traceback (most recent call last): > File "", line 1, in > IndexError: list index out of range You must have only three features (indexed 0, 1 and 2) which explains the index error. > I wonder if I could access the above dicts as seq_record.features['source'] > or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone? As the .type attribute, try this: for feature in seq_record.features: print feature.type You can't access the features by type (e.g. seq_record.features['CDS']) because there is generally more than one feature of each type. Peter P.S. Most of your comments are not on Bio.SeqIO itself, but actually about the underlying Bio.GenBank parser or the SeqFeature object. I have therefore changed the title. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 16 10:49:19 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Jul 2008 11:49:19 +0100 Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py Message-ID: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com> Michiel, I just noticed your CVS revision to Bio/Saf/__init__.py removing this snippet of code: dumpfile = open( 'dump', 'w' ) dumpfile.write( data ) dumpfile.close() I recall seeing (and removing) a similar lump of diagnostics/debugging code from another of Katharine Lindner's parsers. There is still a similar bit of code in Bio/IntelliGenetics/__init__.py which we could remove, but as the whole module is now deprecated we could just wait for a few releases and then remove it entirely. Peter From bugzilla-daemon at portal.open-bio.org Wed Jul 16 11:40:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 07:40:53 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200807161140.m6GBerMH021048@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 07:40 EST ------- (In reply to comment #8) > Whether or not to stop translating at the first stop codon could be an > argument to the translate method. As an alternative, it may be preferable > to have a split() method that splits the sequences at the stop codons. > Such a method could be applied to all protein sequences, not only those > created by translate(). Adding a split() method to the Seq object is a good idea in general (making the Seq object more like a python string), and using my_protein.split("*") is an nice example usage of this. I have posted a possible implementation of the split() method for the Seq object on Bug 2351 comment 15 http://bugzilla.open-bio.org/show_bug.cgi?id=2351#c15 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 16 12:40:03 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Jul 2008 13:40:03 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> References: <492634.64872.qm@web62414.mail.re1.yahoo.com> <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> Message-ID: <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> >> But, there is also a set of interconnected modules where it's not 100% >> clear if they can be removed without causing some surprises: >> Bio.builders >> Bio.config >> Bio.dbdefs >> Bio.formatdefs >> Bio.dbdefs >> Bio.expressions >> Bio.FormatIO >> Bio.Std >> Bio.StdHandler >> It is probably OK to remove these, since these were deprecated we did >> not get a barrage of complaints from our users. Personally, I think it is >> important to keep the code base clean, so I am in favor of removing >> these (and see if anybody complains; in that case, we can always put >> these modules back in and make a new release). But I can live with >> keeping these modules for another release round. If anybody thinks >> that that would be better, please let us know. > > Given some of these are very interconnected, I would be inclined to leave > them in for one more release. However I'm content to see them go. If no > one else has any qualms, then please carry on. Now that Biopython 1.47 is out, its probably time to remove Bio.expressions (deprecated in 1.44) and explicitly deprecate the rest: Bio.builders Bio.config Bio.dbdefs Bio.formatdefs Bio.Std Bio.StdHandler (plus Bio.Writer which is part this "Bioformat" code base?) The final entry from your list, Bio.FormatIO, has already been removed. Peter From mjldehoon at yahoo.com Wed Jul 16 14:14:07 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 16 Jul 2008 07:14:07 -0700 (PDT) Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py In-Reply-To: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com> Message-ID: <729090.76301.qm@web62408.mail.re1.yahoo.com> I removed a similar piece of code in one more module (I forgot which one). While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO. --Michiel. --- On Wed, 7/16/08, Peter wrote: > From: Peter > Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py > To: "BioPython-Dev Mailing List" > Date: Wednesday, July 16, 2008, 6:49 AM > Michiel, > > I just noticed your CVS revision to Bio/Saf/__init__.py > removing this > snippet of code: > > dumpfile = open( 'dump', > 'w' ) > dumpfile.write( data ) > dumpfile.close() > > I recall seeing (and removing) a similar lump of > diagnostics/debugging > code from another of Katharine Lindner's parsers. > > There is still a similar bit of code in > Bio/IntelliGenetics/__init__.py which we could remove, but > as the > whole module is now deprecated we could just wait for a few > releases > and then remove it entirely. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Wed Jul 16 14:44:28 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Jul 2008 15:44:28 +0100 Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py In-Reply-To: <729090.76301.qm@web62408.mail.re1.yahoo.com> References: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com> <729090.76301.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com> On Wed, Jul 16, 2008 at 3:14 PM, Michiel de Hoon wrote: > I removed a similar piece of code in one more module (I forgot which one). Bio/MetaTool/__init__.py if anyone wanted to know. The CVS changes RSS feed is handy: http://biopython.org/wiki/Tracking_CVS_commits > While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO. Yes, it probably does - assuming anyone still uses the file format. I'll take a look at that at some point. Peter wrote: >> There is still a similar bit of code in >> Bio/IntelliGenetics/__init__.py which we could remove, but >> as the whole module is now deprecated we could just wait >> for a few releases and then remove it entirely. I've removed the Bio.IntelliGenetics dumpfile code in CVS. Peter From bugzilla-daemon at portal.open-bio.org Wed Jul 16 15:01:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 11:01:41 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807161501.m6GF1fuG028930@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #26 from mdehoon at ims.u-tokyo.ac.jp 2008-07-16 11:01 EST ------- I've uploaded a fixed parser in Bio.Sequencing.Ace to CVS; feel free to have a look and comment. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 16 15:32:03 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Jul 2008 16:32:03 +0100 Subject: [Biopython-dev] Dump file in Bio/Saf/__init__.py In-Reply-To: <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com> References: <320fb6e00807160349r105bda08x3cf5e31915896a9b@mail.gmail.com> <729090.76301.qm@web62408.mail.re1.yahoo.com> <320fb6e00807160744y7d809533sb5c9cdc82c907aa9@mail.gmail.com> Message-ID: <320fb6e00807160832w4eef825ek3ed4cfde1cc92cd2@mail.gmail.com> >> While we're on the subject: the functionality currently in Bio.Saf probably belongs in Bio.AlignIO. > > Yes, it probably does - assuming anyone still uses the file format. > I'll take a look at that at some point. I've been looking at the PredictProtein site's SAF (Simple Alignment Format) specification, which as far as I know is the only definition (spelling errors and all). Its a free-format somewhat like PHYLIP, and for "nice" input files parsing shouldn't be too difficult. However, some of the corner cases they give are frankly evil, and I wonder if Bio.Saf is actually compliant. See http://www.predictprotein.org/Dexa/optin_safDes.html I'd like to propose deprecating Bio.Saf on the main mailing list. If there are people wanting to use this SAF format, we can then worrying about implementing a non-Martel parser for this file format in Bio.AlignIO instead - and explicitly test it can cope with all the examples given. Peter P.S. I updated Bio.Saf to use the new URL for the PredictProtein site. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 16:08:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 12:08:38 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807161608.m6GG8c0s031867@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 12:08 EST ------- Regarding the Bio.Sequencing.Ace changes (comment 26), some of it looks a bit repetitive. Might a sub-function help here? Also, I was wondering if you managed to fix Bug 2446 as a nice bonus. Regarding the Bio.Sequencing.Phd changes, Michiel has now deprecated Frank & Cymon's original scanner/consumer parser. I didn't think it make sense to leave the original header as it was (with their old version number etc and the suggestion to contacting them directly with bugs). They are of course still listed in the copyright header. New Bio.Sequencing.Phd docstring header text in CVS: """ Parser for PHD files output by PHRED and used by PHRAP and CONSED. This module can be used used directly which will return Record objects which should contain all the original data in the file. Alternatively, using Bio.SeqIO with the "phd" format will call this module internally. This will give SeqRecord objects for each contig sequence. """ Previous text: """ Parser for PHD files output by PHRED and used by PHRAP and CONSED. Works fine with PHRED 0.020425.c Version 1.1, 03/09/2004 written by Cymon J. Cox (cymon.cox at gmail.com ) and Frank Kauff (fkauff 'AT' biologie.uni-kl.de). Comments, bugs, problems, suggestions to one of us are welcome! """ Frank & Cymon - I should have asked first, but is this revised wording OK with you? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 20:35:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 16:35:13 -0400 Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements In-Reply-To: Message-ID: <200807162035.m6GKZDOn012941@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2544 ------- Comment #2 from mmokrejs at ribosome.natur.cuni.cz 2008-07-16 16:35 EST ------- (In reply to comment #1) > > I'm guessing the missing line here was something like: > seq_record = SeqIO.read(handle, "genbank") Yes, I forgot to paste one line, sorry. > > I do not see how I could access the value 'DNA' from the LOCUS line: > > LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008 > > Currently the sequence type (DNA, RNA, Protein) is used internally by the > GenBank parser to determine the alphabet. It is not currently recorded in the > SeqRecord object's annotation but could be. How about something like this?: > > seq_record.annotations["seq_type"] I am not much familiar with the official naming of the fields in LOCUS line by Genbank but hope you are. Yes, it would be fine for me. I hope all other values from LOCUS line can be accessed similarly as well. > > Could seq_record.features have a repr() function to give me something useful > > instead of this? > > > > >>> print seq_record.features > > [, > instance at 0x837b9cc>, ] > > Yes we could add that, but you wouldn't want to do that on a typical genome > with thousands of features. Adding a repr method for the Reference object is > also something I had wondered about doing. I think it could be there even for large records. It not up to the programmer to use repr() or not, and while testing/learning it would be really useful. Or at least internally the routine could check for number of features and in case there would be thousands it could print some first and then stop with a clear message how to force for full listing. > > I don't see documented anywhere in the biopython docs access the features, > > pasting something like the following into docs would give a user clue where to > > look for for values: > > > > >>> print seq_record.features[0].qualifiers > > {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic > > construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq: > > aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']} > > >>> print seq_record.features[1].qualifiers > > {'gene': ['NOS']} > > >>> print seq_record.features[2].qualifiers > > {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number': > > ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma > > gondii'], 'db_xref': ['GI:166831529'], 'translation': > > ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'], > > 'gene': ['NOS'], 'protein_id': ['ABP65329.2']} > > There is a minimal bit of text in what is currently Chapter 10 of the tutorial > on the SeqFeature object. I agree, this is an area that needs improvement. Yes I read that before but it is too short, even after reading 2.4.2, 4.2.1, 9.2 and http://biopython.org/wiki/SeqIO. > > Perhaps a full example of parsing a simple GenBank file in the SeqIO chapter > would help? Definitely, you should pick some exceptional record having different fields, I think the one I have shown is quite OK. > > > >>> print seq_record.features[3].qualifiers > > Traceback (most recent call last): > > File "", line 1, in > > IndexError: list index out of range > > You must have only three features (indexed 0, 1 and 2) which explains the > index error. I knew, it was intentional. ;-) > > > I wonder if I could access the above dicts as seq_record.features['source'] > > or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone? > > As the .type attribute, try this: > > for feature in seq_record.features: > print feature.type >>> for feature in seq_record.features: ... print feature.type ... source gene CDS >>> > > You can't access the features by type (e.g. seq_record.features['CDS']) > because there is generally more than one feature of each type. Yes, but how about seq_record.features['CDS'][index]? Could that be provided? > P.S. Most of your comments are not on Bio.SeqIO itself, but actually about the > underlying Bio.GenBank parser or the SeqFeature object. I have therefore > changed the title. Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Jul 17 01:11:43 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 17 Jul 2008 02:11:43 +0100 Subject: [Biopython-dev] Biopython presentation at BOSC2008 Message-ID: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> Hi all, This year I will be delivering the Biopython presentation at BOSC 2008. The current draft is attached to this email (ppt format - yuck - but the easieast to edit). Comments, suggestions, changes are most welcome. Just one point, the presenation is this Saturday, so if you have any comments, please send them soon. There is one slide still to be completed and a few presentation/looks issues still to be edged out. Many thanks, Tiago -- "Data always beats theories. 'Look at data three times and then come to a conclusion,' versus 'coming to a conclusion and searching for some data.' The former will win every time." ?Matthew Simmons, http://www.tiago.org -------------- next part -------------- A non-text attachment was scrubbed... Name: bosc2008.ppt Type: application/vnd.ms-powerpoint Size: 482816 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Thu Jul 17 13:07:53 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Jul 2008 14:07:53 +0100 Subject: [Biopython-dev] Biopython presentation at BOSC2008 In-Reply-To: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> References: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> Message-ID: <320fb6e00807170607s32af2744j479eb2b2e545f454@mail.gmail.com> > Comments, suggestions, changes are most welcome. Just one point, the > presentation is this Saturday, so if you have any comments, please send > them soon. I've sent Tiago some specific comments directly (little things). One issue which might deserve wider discussion is the project's short term goals. I would suggest putting: * Moving from CVS to Subversion * Make Sequence objects more OO and string-like * More file formats in Bio.SeqIO and Bio.AlignIO And also perhaps the Numeric to numpy move, Bug 2251 http://bugzilla.open-bio.org/show_bug.cgi?id=2251 I subscribe to the numpy mailing list and they seem to have been making big strides in the documentation. Also it looks like they plan to make Travis Oliphant's "Guide to NumPy" book free after "SciPy 2008" - which probably means the August 2008 SciPy conference at Caltech rather than EuroSciPy 2008 in July in Germany. Peter From tiagoantao at gmail.com Thu Jul 17 21:45:56 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 17 Jul 2008 22:45:56 +0100 Subject: [Biopython-dev] Biopython presentation at BOSC2008 In-Reply-To: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> References: <6d941f120807161811n44eafa5ata705fa3189443681@mail.gmail.com> Message-ID: <6d941f120807171445t32178835n6f5dd77f11f3f004@mail.gmail.com> Hi all, I would like to thank all that sent comments. I used the vast majority of comments sent, so feedback was really useful. Tiago On Thu, Jul 17, 2008 at 2:11 AM, Tiago Ant?o wrote: > Hi all, > > This year I will be delivering the Biopython presentation at BOSC > 2008. The current draft is attached to this email (ppt format - yuck - > but the easieast to edit). > Comments, suggestions, changes are most welcome. Just one point, the > presenation is this Saturday, so if you have any comments, please send > them soon. > > There is one slide still to be completed and a few presentation/looks > issues still to be edged out. > > Many thanks, > Tiago > > -- > "Data always beats theories. 'Look at data three times and then come > to a conclusion,' versus 'coming to a conclusion and searching for > some data.' The former will win every time." > ?Matthew Simmons, > http://www.tiago.org > -- "Data always beats theories. 'Look at data three times and then come to a conclusion,' versus 'coming to a conclusion and searching for some data.' The former will win every time." ?Matthew Simmons, http://www.tiago.org From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:07:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:07:02 -0400 Subject: [Biopython-dev] [Bug 1999] new frame translation method In-Reply-To: Message-ID: <200807190007.m6J0721C023043@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1999 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:09:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:09:26 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200807190009.m6J09Qm2023188@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:30:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:30:36 -0400 Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author names In-Reply-To: Message-ID: <200807190030.m6J0Ua27024398@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2448 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz ------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:30 EST ------- (In reply to comment #2) > {u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName': > u'Jos\xe9'}, If I remember right this is the string-ified representation of utf8 data when you call str() or repr() on them. One could then in upper code try to convert it back but one has to invent the magic code. In my programs I avoid unicode but stick to utf8 and pass it back to the user. But as I say, you may never use print(), str(), repr() because they are not utf8/unicode safe. That should be one of the things to be fixed in python-3. So in summary when I do raise an exception these values will get always printed in the above escaped form, but it is the only exception. I believe as long as you return the values the current code is ok. But, haven't tested. grep-ing related stuff from my programs use e.g.: self._connection = connect(unix_socket=unix_socket, db=dbname, user=username, passwd=password, init_command='SET AUTOCOMMIT=0', charset='utf8', use_unicode=False) if self._connection.character_set_name() != 'utf8': # test whether we really have utf8 connection raise RuntimeError, "Connection to mysql not in utf8 mode: %s" % self._connection.character_set_name() value = unicode(value).encode('utf8') http://evanjones.ca/python-utf8.html http://www.idealliance.org/proceedings/xtech05/papers/02-08-01/ http://www.amk.ca/python/howto/unicode http://diveintopython.org/xml_processing/unicode.html http://www.jorendorff.com/articles/unicode/python.html from elementtree.ElementTree import parse, Element, SubElement, ElementTree # use 'utf8' and not 'utf-8' for Element.write() !!! # We must supply unicode values to ElementTree and not just utf8 encoded strings. _value_node.text = _value.decode('utf8') -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:37:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:37:36 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807190037.m6J0baGc024748@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz ------- Comment #6 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:37 EST ------- I was just about to report this bug. I use biopython to translate EST sequences. They are full of sequencing errors although one knows the CDS region, still it is often interrupted by N's or by literal STOP codons. The current implementation in biopython-1.47 broke it for me. I haven't tested the attached patches but would propose to make this strict check optional. Currently it seems there is no way to pass down the code some variable not to barf in such cases. Will attach my current hack. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 00:38:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Jul 2008 20:38:48 -0400 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200807190038.m6J0cmLK024884@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz 2008-07-18 20:38 EST ------- Created an attachment (id=972) --> (http://bugzilla.open-bio.org/attachment.cgi?id=972&action=view) Hack not to break on Ns for unknown bases in ESTs -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 12:47:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 08:47:34 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807191247.m6JClYEO004649@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #955 is|0 |1 obsolete| | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 08:47 EST ------- (From update of attachment 955) I've checked this code in, with the most of the assertions moved into a new unit test. This patch is now obsolete. Checking in Bio/Data/CodonTable.py; /home/repository/biopython/biopython/Bio/Data/CodonTable.py,v <-- CodonTable.py new revision: 1.9; previous revision: 1.8 done RCS file: /home/repository/biopython/biopython/Tests/test_CodonTable.py,v done Checking in Tests/test_CodonTable.py; /home/repository/biopython/biopython/Tests/test_CodonTable.py,v <-- test_CodonTable.py initial revision: 1.1 done RCS file: /home/repository/biopython/biopython/Tests/output/test_CodonTable,v done Checking in Tests/output/test_CodonTable; /home/repository/biopython/biopython/Tests/output/test_CodonTable,v <-- test_CodonTable initial revision: 1.1 done -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 12:52:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 08:52:02 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807191252.m6JCq26c004896@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 08:52 EST ------- (In reply to comment #6) > I was just about to report this bug. I use biopython to translate EST > sequences. They are full of sequencing errors although one knows the CDS > region, still it is often interrupted by N's or by literal STOP codons. The > current implementation in biopython-1.47 broke it for me. I haven't tested the > attached patches but would propose to make this strict check optional. > Currently it seems there is no way to pass down the code some variable not to > barf in such cases. Will attach my current hack. Do you have an example which "worked" in an older version of Biopython, but is "broken" in Biopython 1.47? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 16:40:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 12:40:58 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807191640.m6JGew57014127@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 12:40 EST ------- >gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence AAGAAAACGAGAAGGACGGGGTTATATAGTAAGGTACAAACAGGGCANNNNNNCCATTACACGACCAACT TCTTCGCCTTGCCCTTTTTCTCAGAGTCCTTGTGCGACAGGAACTCGACCTCGGTCGCAAGAGGCCCAGC AAGTCGCGCTCCCTCGGGGTACCCAAGCACACTCATCTTGAAATGCTTCCCAACTCCCTCAATCCTTTCC CGCAGCCCCGCATCCTCCTCGGTCGGTGCAAGTCGCGTCCATATCGACAATCGATAAAACTGCGGCCGCG TCGACACGATCACACCTGTAATCAGCGACGCAGACCCACTTCCACCCGTCAGCGTCGGCGATGGGTCAAA TGTTTCCCCGATCGCAGCCAGCATCGTATACAGCCACATCTTGTCTACGTTGGGTCGGTTTTTATCTTTG GGCAGTTGGATACTCCATTTTCCTCCAAGCTTGTTCGCCTCGTCCTCCCATGCGGGAATAATTCCCTCCT TGAAAAGGTAATAATTTGCCTTCTGGGGCAGTTGAGATGGCGGTATGATGTTGTTATATAACCCCCAAAA CTCCNNNNNGCTATCAAAGNNNNNGACCCGCNNNNNGTCCNCCANNNACCCTTNNNCCNNNNNANNNCCG GNNNNNNNNNNNNTGNGGGTCNNNNNNNNNGCTNNNNNNNNNNTNNNNNG resulted as of Aug 5 2007 in a six-frame translation >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+1 KKTRRTGLYSKVQTG***HYTTNFFALPFFSESLCDRNSTSVARGPASRAPSGYPSTLILKCFPTPSILSRSPASSSVGASRVHIDNR*NCGRVDTITPVISDADPLPPVSVGDGSNVSPIAASIVYSHILSTLGRFLSLGSWILHFPPSLFASSSHAGIIPS LKR**FAFWGS*DGGMMLLYNPQNS**LSK**TR**S***P*****P******V***A***** >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+2 RKREGRGYIVRYKQG***ITRPTSSPCPFSQSPCATGTRPRSQEAQQVALPRGTQAHSS*NASQLPQSFPAAPHPPRSVQVASISTIDKTAAASTRSHL*SATQTHFHPSASAMGQMFPRSQPASYTATSCLRWVGFYLWAVGYSIFLQACSPRPPMRE*FPP *KGNNLPSGAVEMAV*CCYITPKT***YQ***P****P*TL*****R*****G********** >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:+3 ENEKDGVI**GTNRA**PLHDQLLRLALFLRVLVRQELDLGRKRPSKSRSLGVPKHTHLEMLPNSLNPFPQPRILLGRCKSRPYRQSIKLRPRRHDHTCNQRRRPTSTRQRRRWVKCFPDRSQHRIQPHLVYVGSVFIFGQLDTPFSSKLVRLVLPCGNNSLLEKVIICLLGQLRWRYDVVI*PPKL**AIK**DP**V***P************G********** >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-1 **********W************P***L**AQ**KLS**LKTPNILL*YGGRVDGVFRLIMEKFLP**GRTLLLRLFEPPFTS*VDGFLFLAGLHLFYTDICYDRR*PLCKLGSGCDCPPSPRRSD*CPH*HSCAGVKIANSYTCAERGWLLLRPDALS*LPQPFVKFYSHEPMGLPRAERPGERWLQLKDSVFLRLFFPFRFFNQHIT**TGQTWNDILGQEEQK >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-2 **********G*****G*****FP*T****P***NY***SKPPIYCCSMAVELTGSSV**WKSSSLNKGVPSCSACSNLLLPHRLTGFYFWLGCICSTPTYATTDASPFVNWVAAATAHLHPDAATNVHTSTAAPASK*LTAIPALNVAGSSYAPTPFPNSLNPS*SSTHTNPWGSLALNDPENAGSSSRTACS*DSFSRSASSTSTL***RDKHGMIYWGRKSKR >gi|45741280|gb|CK993509.1|CK993509 gi|45741280|gb|CK993509.1|CK993509 024C12R1.2A ESTHcyl Hebeloma cylindrosporum cDNA clone ESTHcyl024C12, mRNA sequence frame:-3 *****S***L******A*****S***P**RP**ETI**PQNPQYIVVVWR*S*RGLPFNNGKVPPLIRAYPPAPLVRTSFYLIG*RVSIFGWVASVLHRHMLRPTLAPL*TG*RLRLPTFTQTQRLMSTLAQLRRRQNS*QLYLR*TWLAPPTPRRPFLTPSTLRKVLLTRTHGAPSR*TTRRTLAPAQGQRVPETLFPVPLLQPAHY***GTNME*YIGAGRAKE -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 16:44:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 12:44:36 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807191644.m6JGiahx014350@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 12:44 EST ------- BTW, formatdb silently ignores asterisks so you have to replace them with X yourself otherwise alignment outputs from blast do not reflect reality. Don't know if I would prefer biopython to give me 'X' instead of '*', maybe for codons with 'N', 'R' would prefer X while for true STOP codons would prefer '*'. In PIR database is nice that proteins really ending at a STOP codon have a trailing '*'. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 20:24:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 16:24:41 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807192024.m6JKOffD023599@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-19 16:24 EST ------- How did you do the six translations in comment 9? Using Bio.Seq.translate() would have failed with a TranslationError on any "NNN" codon or similar. By common agreement "*" is used for a stop symbol. While "X" generally means any amino acid, I have somethimes seen it used to mean any amino acid OR a stop codon (in the NCBI translations in certain GenBank files). Personally I think it would be nice if there was an agreed character for an amino acid OR stop codon (e.g. "!" for example). However, as far as I know no such convention exists, so we shouldn't invent one as the default in Biopython. P.S. The nicest way to handle translate("NNN") isn't what I filed this bug about. Its the fact that translate("{@}") or anything else like that returns "*" and not an error. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 19 21:40:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Jul 2008 17:40:35 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807192140.m6JLeZQR025907@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #12 from mmokrejs at ribosome.natur.cuni.cz 2008-07-19 17:40 EST ------- Created an attachment (id=973) --> (http://bugzilla.open-bio.org/attachment.cgi?id=973&action=view) translate_ESTs.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 14:46:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 10:46:23 -0400 Subject: [Biopython-dev] [Bug 2547] New: Translation of ambiguous codons like NNN and TAN Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2547 Summary: Translation of ambiguous codons like NNN and TAN Product: Biopython Version: 1.47 Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk It is often useful to want to translate ambiguous nucleotide sequences (e.g. EST sequences), and this may contain codons which could code for an amino acid OR a stop codon (e.g. NNN, TNN or TAN). See for example Bug 2530 comment 6 and comment 9. Currently Bio.Seq.translate() will not translate such sequences and raises an exception. The following example shows correct translation of ambiguous codons which only encode valid amino acid(s) OR valid stop codons (but not both): from Bio.Seq import translate assert translate("TAA") == "*" assert translate("TAG") == "*" assert translate("TAT") == "Y" assert translate("TAC") == "Y" #Recall ambiguous nucleotide Y means T or C (pYrimidine) #so TAY = TAT or TAC which both code for Y (Tyr, Tyrosine) assert translate("TAY") == "Y" #Recall ambigous nucleoide R means G or A (puRine) #so TAR = TAG or TAA which both code for a stop codon assert translate("TAR") == "*" However, in Biopython 1.47 the following all raise an exception: translate("TAN") translate("TAM") translate("TAK") translate("TRR") translate("TNN") translate("NNN") TAN, TAM, TAK, ... can code for Y or stop. More generally, "TRR" and "TNN" can code multiple amino acids or a stop codon, and "NNN" can code for any amino acid or a stop codon. According to IUPAC, the single letter protein code X is an "unknown or 'other' amino acid" (igoring its historic and obsolete usage for selenocysteine, now U). http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html This document does NOT cover the idea of stop codons, and I am not aware of any additional symbol to mean "any amino acid OR a stop codon" which would be ideal for this situation. For comparison, the EMBOSS transeq tool will use X when given a codon which could be either an amino acid OR a stop codon: $ transeq -filter asis:NNNTANTARTAGTAYTAC XX**YY Therefore one solution would be to follow EMBOSS and return X for codons which could be an amino acid OR a stop codon. See also Bug 2530 on the related issue that Bio.Seq.translate() currently translates invalid codons as "*" (presumably an accidental side effect of the implementation). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 14:50:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 10:50:22 -0400 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: Message-ID: <200807201450.m6KEoMVZ017607@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2530 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 10:50 EST ------- Martin, I've filed Bug 2547 ("Translation of ambiguous codons like NNN and TAN") on the separate issue of wanting to translate ambigous codons as found in EST sequences. This bug (Bug 2530) is only for the mis-translation of invalid codons as stop characters. If there is agreement that changing the behaviour of Bio.Seq.translate() as described in Bug 2547 is desirable, then we end up fixing both issues at the same time. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sun Jul 20 15:03:48 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 20 Jul 2008 16:03:48 +0100 Subject: [Biopython-dev] [Bug 2530] Bio.Seq.translate() treats invalid codons as stops In-Reply-To: <200807192140.m6JLeZQR025907@portal.open-bio.org> References: <200807192140.m6JLeZQR025907@portal.open-bio.org> Message-ID: <320fb6e00807200803v57820ab8v2502d6e5671933cc@mail.gmail.com> > ------- Comment #12 from mmokrejs ------- > Created an attachment (id=973) > --> (http://bugzilla.open-bio.org/attachment.cgi?id=973&action=view) > translate_ESTs.py Martin, I had some general comments on your code which you might find helpful. Most of your variable name start with an underscore - this is very unusual. There is a convention in Python that a single leading underscore is used for private properties or methods of an object. You used the following code to reverse a string by turning it into a list and back again: _reversed = list(_record.sequence) _reversed.reverse() _reversed = ''.join(_reversed) For simply reversing a string, I would suggest using a stride of minus one instead, reversed_string = old_string[::-1] You then go on to take the reverse complement (without worrying about ambiguous characters which could be present, e.g. R -> Y): _reversed = list(_record.sequence) _reversed.reverse() _reversed = ''.join(_reversed) _reversed = _reversed.translate(string.maketrans('AaTtGgCcUu', 'TtAaCcGgAa'), '') I would suggest using the Bio.Seq.reverse_complement() function here instead. Finally are you aware of the string formatting operator (%) in python? The following code: _outprothandle.write(''.join(('>', _record.gi, ' ', _record.definition, ' frame:-3', '\n', translate(_reversed[2:]).replace('*','X'), '\n'))) might typically be written as: _outprothandle.write('>%s %s frame:-3\n%s\n" % (_record.gi, _record.definition, translate(_reversed[2:]).replace('*','X'))) See http://docs.python.org/lib/typesseq-strings.html for more details (and how to use named insertion points). Peter From bugzilla-daemon at portal.open-bio.org Sun Jul 20 16:08:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 12:08:22 -0400 Subject: [Biopython-dev] [Bug 2548] New: Updating IUPACData and ExtendedIUPACProtein for U and O Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2548 Summary: Updating IUPACData and ExtendedIUPACProtein for U and O Product: Biopython Version: 1.47 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The IUPAC data in Biopython has not been updated to officially use X for any amino acid and U for selenocysteine (Sec). Nor do we support O for pyrrolysine (Pyl) . I haven't found an official statement from the IUPAC-IUBMB Joint Commission on Biochemical Nomenclature via Google, but several major resources confirm this: http://www.ebi.ac.uk/RESID/faq.html http://www.uniprot.org/news/2008/02/26/release http://doc.bioperl.org/bioperl-live/Bio/Tools/IUPAC.html Patch to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 16:26:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 12:26:10 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807201626.m6KGQAQZ021741@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 12:26 EST ------- See also: http://www.chem.qmul.ac.uk/iupac/AminoAcid/A2021.html Taking the following as the current IUPAC standard, there is no direct mention of the use of J in NMR as designation for signals assigned either to leucine (L) or to isoleucine (I) which cannot be distinguished from each other. I am therefore NOT intending to add J to Biopython's IUPAC extend protein alphabet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 16:54:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 12:54:51 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807201654.m6KGsp7L022759@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 12:54 EST ------- Created an attachment (id=974) --> (http://bugzilla.open-bio.org/attachment.cgi?id=974&action=view) Adds U and O, clearly defines X, but does not add J Does anyone have any definative sources on the MW of these "new" amino acids? Also I'd like to confirm if IUPAC have officially accepted "J" or not. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 18:30:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 14:30:17 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807201830.m6KIUHMb028714@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz ------- Comment #1 from mmokrejs at ribosome.natur.cuni.cz 2008-07-20 14:30 EST ------- Regarding the selenocystein issue, expect "inconsistencies" between data files released from NCBI. I haven't check now but in 2002 I had the following communication with NCBI staff: GenBank format requires official IUPAC amino acid code that doesn't include Selenocystein and therefore it uses 'X'. FASTA format uses the NCBI extended amino acid code that does include Selenecystein 'U'. > >gi_2983532 formate dehydrogenase alpha subunit [Aquifex aeolicus] > MNYMDISRRGFLKLSVGSVGAGILGGLGFDLTPAYARVRDLKITKAKVTKSICPYCSVSCGILAYSLSDG > AMNVKERIIHVEGNPDDPINRGTLCPKGATLRDFVNAPDRLTKPLYRPAGSTEWKEISWDEAIEKFARWV > KDTRDRTFIHKDKAGRVVNRCDSIVWAVGSPLGNEEGWLMVKIGIALGLSARETQATIUHAPTVASLAPT > ------------------------^ [cut] > > It seems there's buggy version in > ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.faa > although the .gbk flatfile says "X" in case of "U". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 20 21:16:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 20 Jul 2008 17:16:48 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807202116.m6KLGmdb005982@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #974 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-20 17:16 EST ------- Created an attachment (id=975) --> (http://bugzilla.open-bio.org/attachment.cgi?id=975&action=view) Tested version of previous patch This revision includes a work arround for missing molecular weights in _make_ambiguous_ranges() function, and has been tested with the full test suite on Linux. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 10:55:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 06:55:13 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807211055.m6LAtDHp009314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 06:55 EST ------- (In reply to comment #1) > Regarding the selenocystein issue, expect "inconsistencies" between data files > released from NCBI. I haven't check now but in 2002 I had the following > communication with NCBI staff ... I think you meant to post this on Bug 2548. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:04:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 07:04:14 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807211104.m6LB4E0w009769@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 ------- Comment #3 from mmokrejs at ribosome.natur.cuni.cz 2008-07-21 07:04 EST ------- Yes, sorry. :( -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:10:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 07:10:02 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807211110.m6LBA2H8010005@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:10 EST ------- I've gone over the GenBank release notes on this issue... Quoting ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb131.release.notes (Dated August 15 2002, similar text appears in earlier files too as a warning of intended changes) ============================================================== 1.3.3 Selenocysteine representation At the May 1999 DDBJ/EMBL/GenBank collaborative meeting, it was learned that IUPAC plans to adopt the letter 'U' for selenocysteine. With this August 2002 release, selenocysteine residues are now presented via residue abbreviation 'U', in both /translation and /transl_except qualifiers. ============================================================== By now they SHOULD have fixed any sequences which were using X for selenocysteine to use U instead. Quoting ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb156.release.notes (Dated October 15 2006, similar text appears in earlier files too as a warning of intended changes) ============================================================== 1.3.4 New protein residue abbreviation for Pyrrolysine Sequence databases use single-letter amino acid abbreviations to record the primary structure (sequence) of amino acids in a polypeptide. The table of abbreviations includes only those amino acids that are encoded in the genetic code and directly inserted by a tRNA during the process of protein translation. Post-translational modifications are not represented in the sequence data itself, but may be described by features annotated on the sequence. The discovery of the 22nd naturally encoded amino acid, pyrrolysine, and the recent submission of sequence records that should contain this residue, require the adoption of a new amino acid abbreviation. Because several letters are assigned to represent different experimental ambiguities, the only letter still available for use is O (uppercase letter o). Scientists working in the field have independently suggested use of this letter, and it has a reasonable mnemonic, pyrrOlysine. The IUPAC-IUBMB Joint Commission on Biochemical Nomenclature has agreed that Pyl/O will be recommended for this amino acid. The consequences for flatfile users are that O can now appear in CDS /translation qualifiers, and that Pyl (the three-letter abbreviation) can appear in CDS /transl_except qualifiers and in the /product and /anticodon qualifiers of tRNA features. These changes are legal as of this October 2006 GenBank Release. Sample ASN.1, FASTA, GenBank flatfile, and INSDSeq XML files for CP000099, which has a protein with a pyrrolysine residue, are available for testing purposes at the NCBI FTP site: ftp://ftp.ncbi.nih.gov/genbank/Pyrrolysine_Samples Files: CP000099.pse (print-form ASN.1 Seq-entry) CP000099.gbff (GenBank flatfile) CP000099.aa_fsa (protein FASTA) CP000099.isx (INSDSeq XML) ============================================================== And later on in the same file, ============================================================== 1.3.5 Protein residue J for leucine/isoleucine ambiguities The residue abbreviation J is reserved for mass spectrometry experiments that cannot distinguish leucine from isoleucine. Although this abbreviation has been part of the IUPAC recommendations for some time, it has not previously appeared in protein sequences in the GenBank database. As of October 2006, abbreviation J is legal in CDS /translation qualifiers, and Xle (the three-letter abbreviation) will be allowed in CDS /transl_except qualifiers and in the /product and /anticodon qualifiers of tRNA features. J will also be mapped to unknown (X) for the purpose of BLAST and other sequence similarity search tools. ============================================================== So, according to GenBank, "The residue abbreviation J is reserved for mass spectrometry experiments that cannot distinguish leucine from isoleucine ... this abbreviation has been part of the IUPAC recommendations for some time". I would prefer a direct citation, but that seems good enough evidence to me to include J in the Biopython IUPAC extended protein alphabet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:18:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 07:18:12 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807211118.m6LBICM8010531@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:18 EST ------- Regarding Martin's example (erroneously added to Bugzilla as Bug 2547 comment 1), the protein GI:2983532 Martin wrote "GenBank format requires official IUPAC amino acid code that doesn't include Selenocystein and therefore it uses 'X'." That is out of date - IUPAC and GenBank both accept U for selenocysteine now (see my notes in comment 4 of this bug). Looking at these files: ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.faa ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Aquifex_aeolicus/AE000657.gbk (feature translation) They both give the same amino acid sequence for GI:2983532, which includes "U" but not "X" as I had expected. >gi|2983532|gb|AAC07107.1| formate dehydrogenase alpha subunit [Aquifex aeolicus VF5] MNYMDISRRGFLKLSVGSVGAGILGGLGFDLTPAYARVRDLKITKAKVTKSICPYCSVSCGILAYSLSDG AMNVKERIIHVEGNPDDPINRGTLCPKGATLRDFVNAPDRLTKPLYRPAGSTEWKEISWDEAIEKFARWV KDTRDRTFIHKDKAGRVVNRCDSIVWAVGSPLGNEEGWLMVKIGIALGLSARETQATIUHAPTVASLAPT FGRGAMTNNWVDISNSDLVFVMGGNPAENHPCGFKWAIKAREKRGAKIICIDPRFNRTAAVADIFVQIRP GTDIAFLGGLINYVLQNEKYQKEYVRLHTTGPFIVREDFGFKDGLFTGYDPKTRSYDTTTWDYEFDPATG YPKMDPEMKHPRCVLNILKEHYSRYTPEVVSQICGCSKEDFLRVAEEVAKCGAPNKFMTILYALGWTHHS YGTQLIRTACMLQLLLGNIGCPGGGINALRGHSNVQGMTDLAGQNKNLPTYIKPPKPEEQTLAQHLKNRT PRKLHPTSLNYWANYPKFFISFLKCMWGDAATPENDFAYDYLYKPEGGYNSWDKFIDDMYKGKIEGVVTA ALNFLNNTPNAKKTVRALKNLKWMVVMDPFMIETAQFWKAEGLDPKEVKTEILVLPTAVFLEKEGSFTNS ARWVKWKYKATDPPGDAKDEFWIFGRFFMKLKEFYEKEGGAFPEPILNLVWPYKNPYYPTAEEILTEING YYTRDVDGHKKGERVRLFTDLRDDGSTACGGWLYCGVFPPEGNLAKRTDLSDPLGLGTYPNYAWNWPANR RVLYNRASCDEKGRPWDPERPLLRWDPERDMWVGDIPDYPATAPPEKGIGAFIMLPEGKGRLFAAKSYVT FKDGPLPEHYEPYESPVTNILHPNVPHNPVAKVYKSDLDLLGTPDKFPHVATTYRLTEHYHFWTKHLYGP SLLAPVMFIEIPEELAKEKGIQNGDLVRVSTARASIEAIALVTKRIKPLKVAGKTVYTIGIPIHWGFEGL VKGAITNFITPNVWDPNSRTPEFKGFLANIEKVKT It is quite possible that during the transition from X to U for selenocysteine there were inconsistencies in GenBank - but I hope/expect the NCBI have fixed them all by now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 11:49:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 07:49:21 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807211149.m6LBnLli012323@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #975 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 07:49 EST ------- Created an attachment (id=976) --> (http://bugzilla.open-bio.org/attachment.cgi?id=976&action=view) Adds J, U and O, and clearly defines X as an unknown amino acid Based on the GenBank release notes indirect confirmation that J is now an IUPAC recommendation, I have updated my patch to include J as well. Note that this requires a trivial update to test_seq.py (included in this patch). Still ideally needs the MW filled in for U and O. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 15:25:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 11:25:59 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807211525.m6LFPxgs022821@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:25 EST ------- I've managed to cobble together my first ever Perl program from scratch, and established that BioPerl does the same as EMBOSS - they use an "X" when the codon could be either an amino acid OR a stop codon. My quick BioPerl script, ================================================ use Bio::Seq $nuc_str = 'NNNTANTARTAGTAYTAC'; print "BioPerl translation of:\n"; $seq_obj = Bio::Seq->new(-seq => $nuc_str); print $seq_obj->seq(); print "\n\n"; print "Sequence object's translation method:\n"; print $seq_obj->translate()->seq(); print "\n\n"; use Bio::Perl; print "translate_as_string:\n"; print translate_as_string($nuc_str); print "\n"; ================================================ And the output: ================================================ BioPerl translation of: NNNTANTARTAGTAYTAC Sequence object's translation method: XX**YY translate_as_string: XX**YY ================================================ There does seem to be a consensus building here! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 15:38:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 11:38:03 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200807211538.m6LFc327023466@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:38 EST ------- For comparison, the following is copied from the BioPerl documentation about their sequence object's translate method. It would be nice to follow some of the same naming conventions for any optional arguments. http://www.bioperl.org/Core/Latest/bptutorial.html#iii_3_1_manipulating_sequence_data_with_seq_methods If we want to translate full coding regions (CDS) the way major nucleotide databanks EMBL, GenBank and DDBJ do it, the translate() method has to perform more checks. Specifically, translate() needs to confirm that the sequence has appropriate start and terminator codons at the very beginning and the very end of the sequence and that there are no terminator codons present within the sequence in frame 0. In addition, if the genetic code being used has an atypical (non-ATG) start codon, the translate() method needs to convert the initial amino acid to methionine. These checks and conversions are triggered by setting ``complete'' to 1: $prot_obj = $my_seq_object->translate(-complete => 1); -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 21 15:41:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Jul 2008 11:41:47 -0400 Subject: [Biopython-dev] [Bug 2547] Translation of ambiguous codons like NNN and TAN In-Reply-To: Message-ID: <200807211541.m6LFflk5023670@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2547 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-21 11:41 EST ------- For reference, using the older Bio.Translate approach suffers the same limitation (which is not surprising if you consider they both use the same CodonTable objects internally): >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> from Bio import Translate >>> standard_translator = Translate.ambiguous_dna_by_id[1] The clear cut cases are fine, >>> standard_translator.translate(Seq("TAR", IUPAC.ambiguous_dna)) Seq('*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> standard_translator.translate(Seq("TAY", IUPAC.ambiguous_dna)) Seq('Y', HasStopCodon(ExtendedIUPACProtein(), '*')) When the codon could be an amino acid or a stop, we raise an exception: >>> standard_translator.translate(Seq("NNN", IUPAC.ambiguous_dna)) Traceback (most recent call last): ... Bio.Data.CodonTable.TranslationError: NNN >>> standard_translator.translate(Seq("TAN", IUPAC.ambiguous_dna)) Traceback (most recent call last): ... Bio.Data.CodonTable.TranslationError: TAN -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 22 11:32:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 22 Jul 2008 07:32:10 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807221132.m6MBWAAF016950@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp 2008-07-22 07:32 EST ------- (In reply to comment #27) > Regarding the Bio.Sequencing.Ace changes (comment 26), some of it looks a bit > repetitive. Might a sub-function help here? I thought about that, but each time the repetitive code is slightly different, and I wonder if the end result will be clearer than what we have now. > Also, I was wondering if you managed to fix Bug 2446 as a nice bonus. I am planning to do so. I am checking with the polyphred people if the COMMENT blocks are really intended and are here to stay (note that the polyphred version that writes these COMMENT blocks is a beta version). Will update the code once I hear back from them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Tue Jul 22 11:38:13 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 22 Jul 2008 04:38:13 -0700 (PDT) Subject: [Biopython-dev] Bio.KDTree Message-ID: <108429.69921.qm@web62404.mail.re1.yahoo.com> Hi everybody, Bio.KDTree is currently the only piece of C++ code in Biopython. C++ doesn't compile cleanly on all platforms (for example it is missing in the Biopython installer for Python 2.3 on Windows); some platforms don't even have a C++ compiler. For this reason, setup.py asks the user each time if Bio.KDTree should be compiled. Does anybody (Thomas?) mind if I convert this code to plain C? That would be a nice weekend project. Then we can get rid of the question in setup.py, and Bio.KDTree can be made available on all platforms. --Michiel. From biopython at maubp.freeserve.co.uk Tue Jul 22 16:13:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Jul 2008 17:13:34 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> References: <492634.64872.qm@web62414.mail.re1.yahoo.com> <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> Message-ID: <320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com> On June 27, Michiel wrote: > ..., there is also a set of interconnected modules where it's not 100% > clear if they can be removed without causing some surprises: > Bio.builders > Bio.config > Bio.dbdefs > Bio.formatdefs > Bio.dbdefs > Bio.expressions > Bio.FormatIO [already deprecated and removed] > Bio.Std > Bio.StdHandler > It is probably OK to remove these, since these were deprecated we did > not get a barrage of complaints from our users. Personally, I think it is > important to keep the code base clean, so I am in favor of removing > these (and see if anybody complains; in that case, we can always put > these modules back in and make a new release). But I can live with > keeping these modules for another release round. If anybody thinks > that that would be better, please let us know. Bio.expressions was already deprecated, and seems to be a dependency of the following modules, which I have now explicitly deprecated in CVS: Bio.expressions (deprecated in Biopython 1.44) Bio.config Bio.dbdefs Bio.formatdefs Bio.dbdefs It probably would be fine to remove these five modules now (Bio.expressions, Bio.config, Bio.dbdefs, Bio.formatdefs and Bio.dbdefs), since the indirect warning from Bio.expressions should have alerted anyone who was using them. Or we can ship one more release with them included? Moving on, Bio.Std and Bio.StdHandler appear to be used by: - Bio.expressions (deprecated in Biopython 1.44) - Bio.config (now deprecated in CVS) - Bio.builders (used by Mindy) - Bio.Mindy (used by Bio.config which is now deprecated) As far as I can tell, other historic usage of Mindy (e.g. in Bio.Fasta and Bio.GenBank) has already been deprecated and removed. I think it would therefore also be safe to deprecate these four together (Bio.expressions, Bio.config, Bio.builders and Bio.Mindy), or start by deprecating Bio.Mindy on its own. Peter From bugzilla-daemon at portal.open-bio.org Tue Jul 22 16:29:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 22 Jul 2008 12:29:27 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807221629.m6MGTRuo002799@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-22 12:29 EST ------- Frank, Would you mind if I removed this print statement from the add_sequence() method?: print "WARNING: Sequence name %s is already present. Sequence was added as %s." % (name,unique_name) I'd like to be able to call this method in code from Bio.SeqIO / Bio.AlignIO to write alignments, without getting warnings printed out. Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Jul 22 16:33:53 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Jul 2008 17:33:53 +0100 Subject: [Biopython-dev] Bio.KDTree In-Reply-To: <108429.69921.qm@web62404.mail.re1.yahoo.com> References: <108429.69921.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00807220933v1e6125a7lcb91b963a5dd5195@mail.gmail.com> On Tue, Jul 22, 2008 at 12:38 PM, Michiel de Hoon wrote: > Hi everybody, > > Bio.KDTree is currently the only piece of C++ code in Biopython. C++ doesn't > compile cleanly on all platforms (for example it is missing in the Biopython > installer for Python 2.3 on Windows); some platforms don't even have a C++ > compiler. For this reason, setup.py asks the user each time if Bio.KDTree > should be compiled. Does anybody (Thomas?) mind if I convert this code to > plain C? That would be a nice weekend project. Then we can get rid of the > question in setup.py, and Bio.KDTree can be made available on all platforms. If you want to spend your weekend doing this, it does sounds like a worthwhile incremental improvement to Biopython - and should simplify the build process which is great. Peter P.S. Have you noticed Bug 2489 "KDTree NN search without specifying radius"? http://bugzilla.open-bio.org/show_bug.cgi?id=2489 From bugzilla-daemon at portal.open-bio.org Tue Jul 22 23:50:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 22 Jul 2008 19:50:31 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200807222350.m6MNoVXd024298@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #15 from mmokrejs at ribosome.natur.cuni.cz 2008-07-22 19:50 EST ------- (In reply to comment #5) > Another bonus for people who think OO, is doing dir(my_seq) would > list these useful methods. Right now the user has to know to go > looking in the Bio.Seq module for a function. I do this quite often and this is a weak point in current biopython. Good catch! Regarding the back_translate, I don't use it but people ask for it often so don't remove it. Otherwise I won't know where else to get this functionality. ;-) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 23 00:05:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 22 Jul 2008 20:05:09 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807230005.m6N059QE025415@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #12 from fkauff at biologie.uni-kl.de 2008-07-22 20:05 EST ------- Peter, No problem. Cheers, Frank (In reply to comment #11) > Frank, > > Would you mind if I removed this print statement from the add_sequence() > method?: > > print "WARNING: Sequence name %s is already present. Sequence was added as %s." > % (name,unique_name) > > I'd like to be able to call this method in code from Bio.SeqIO / Bio.AlignIO to > write alignments, without getting warnings printed out. > > Thanks > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 23 11:49:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 23 Jul 2008 07:49:33 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807231149.m6NBnX4P014410@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-23 07:49 EST ------- (In reply to comment #12) > Peter, > > No problem. > > Cheers, > Frank Great. I've removed that print statement (and tweaked a few doc strings) in CVS. Checking in Nexus.py; /home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py new revision: 1.19; previous revision: 1.18 done I'm just working on some alphabet stuff before adding support to write "nexus" format files with Bio.SeqIO and Bio.AlignIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 23 12:33:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 23 Jul 2008 08:33:10 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807231233.m6NCXAk6018007@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-23 08:33 EST ------- Fixed in CVS - you can now write Nexus files using Bio.SeqIO or Bio.AlignIO, provided the alphabet is declared as DNA, RNA or protein. You cannot use generic alphabets or just nucleotide alphabets. Multiple files have been changed, so a complete CVS update is the best way to test this before the next release of Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 23 14:12:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 23 Jul 2008 10:12:38 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200807231412.m6NECc33027073@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 fkauff at biologie.uni-kl.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-23 10:12 EST ------- I recently got some code that is supposed to be able to deal with labeled nodes (probably from the author of this bug - can't check now, as I'm traveling and don't have access to the files). haven't looked at or tested the code yet, but will do soon when I'm back. Frank (In reply to comment #1) > This sounds like a job for Frank (the Bio.Nexus module author). > > Can I ask if you've actually come across trees with names ancestor nodes in > "real life"? That would make this bug more important. If so, the name of the > tool would be interesting, an example tree file would be great to add to > Biopython as a test case. > > If on the other hand the only named ancestor tree you've ever tried is the > example from the Newick documentation, this doesn't seem such a high priority > (but still worth fixing). > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Jul 24 11:41:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 24 Jul 2008 12:41:41 +0100 Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules Message-ID: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com> Hi all, We (Michiel) deprecated the Bio.WWW.* modules in Biopython 1.45, after relocating most of the functionality: Bio.WWW.ExPASy -> Bio.ExPASy Bio.WWW.InterPro -> Bio.InterPro Bio.WWW.NCBI -> Bio.Entrez Bio.WWW.SCOP -> Bio.SCOP Now that the deprecation warnings have been in place for a couple of releases, I'd like to remove the four Bio.WWW.* modules, and leave just Bio/WWW/__init__.py with a deprecation warning telling people where to look for the relocated code. Any comments or objections? Peter From mjldehoon at yahoo.com Fri Jul 25 00:19:33 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 24 Jul 2008 17:19:33 -0700 (PDT) Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules In-Reply-To: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com> Message-ID: <502434.4415.qm@web62406.mail.re1.yahoo.com> Note that Bio.WWW.__init__.py contains some code that is used in other modules. Most (but not all) of these modules are deprecated themselves. For the non-deprecated modules, it's probably easiest to just copy the code from Bio.WWW.__init__.py over to avoid having to import Bio.WWW. --Michiel. --- On Thu, 7/24/08, Peter wrote: > From: Peter > Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules > To: "BioPython-Dev Mailing List" > Date: Thursday, July 24, 2008, 7:41 AM > Hi all, > > We (Michiel) deprecated the Bio.WWW.* modules in Biopython > 1.45, after > relocating most of the functionality: > > Bio.WWW.ExPASy -> Bio.ExPASy > Bio.WWW.InterPro -> Bio.InterPro > Bio.WWW.NCBI -> Bio.Entrez > Bio.WWW.SCOP -> Bio.SCOP > > Now that the deprecation warnings have been in place for a > couple of > releases, I'd like to remove the four Bio.WWW.* > modules, and leave > just Bio/WWW/__init__.py with a deprecation warning telling > people > where to look for the relocated code. > > Any comments or objections? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Fri Jul 25 10:31:49 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 25 Jul 2008 11:31:49 +0100 Subject: [Biopython-dev] Updating the installation instructions Message-ID: <320fb6e00807250331k47ec64dcoe246933f0d02682b@mail.gmail.com> As Nick Matzke has pointed out, http://biopython.org/DIST/docs/install/Installation.html and http://biopython.org/DIST/docs/install/Installation.pdf are somewhat out of date. I've updated the source LaTeX file in CVS to cover python 2.5 being the latest stable python, mxTextTools is now optional (but 2.0 is preferred over 3.0), and removed the bits about the "Classic" Mac (pre OS X). http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/install/Installation.tex?cvsroot=biopython The reportlab instructions probably need updating too - although we should double check if everything is happy with ReportLab 2 as part of this. If anyone wants to skim over the revised version and look for anything I've missed or other improvements that would be great. Peter From biopython at maubp.freeserve.co.uk Fri Jul 25 11:21:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 25 Jul 2008 12:21:31 +0100 Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules In-Reply-To: <502434.4415.qm@web62406.mail.re1.yahoo.com> References: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com> <502434.4415.qm@web62406.mail.re1.yahoo.com> Message-ID: <320fb6e00807250421w15b1d8a9qe9d5d178c233ec7b@mail.gmail.com> On Fri, Jul 25, 2008 at 1:19 AM, Michiel de Hoon wrote: > Note that Bio.WWW.__init__.py contains some code that is used in other modules. > Most (but not all) of these modules are deprecated themselves. For the > non-deprecated modules, it's probably easiest to just copy the code from > Bio.WWW.__init__.py over to avoid having to import Bio.WWW. Good catch - I didn't do my recursive grep correctly. The file Bio/WWW/__init__.py just contains a RequestLimiter class, and this is currently used in: Bio/Blast/NCBIWWW.py (used in qblast, simple to recode as in Bio.Entrez) Bio/config/_support.py (completely deprecated) Bio/Prosite/__init__.py (in the deprecated ExPASyDictionary class) Bio/SwissProt/SProt.py (in the deprecated ExPASyDictionary class) Note I have just updated Bio.Prosite and Bio.SwissProt to use Bio.ExPASy rather than Bio.WWW.ExPASy which means we can delete the deprecated Bio/WWW/ExPASy.py, InterPro.py, NCBI.py and SCOP.py now. Peter From bugzilla-daemon at portal.open-bio.org Sat Jul 26 22:05:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 26 Jul 2008 18:05:24 -0400 Subject: [Biopython-dev] [Bug 2548] Updating IUPACData and ExtendedIUPACProtein for U and O In-Reply-To: Message-ID: <200807262205.m6QM5Ow9021435@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2548 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-26 18:05 EST ------- Checking in Bio/Alphabet/IUPAC.py; /home/repository/biopython/biopython/Bio/Alphabet/IUPAC.py,v <-- IUPAC.py new revision: 1.3; previous revision: 1.2 done Checking in Bio/Data/IUPACData.py; /home/repository/biopython/biopython/Bio/Data/IUPACData.py,v <-- IUPACData.py new revision: 1.5; previous revision: 1.4 done Checking in Tests/test_seq.py; /home/repository/biopython/biopython/Tests/test_seq.py,v <-- test_seq.py new revision: 1.15; previous revision: 1.14 done Marking as fixed, although still ideally needs the MW filled in for U and O. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 15:30:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 11:30:37 -0400 Subject: [Biopython-dev] [Bug 2550] New: Alphabet problems when adding sequences Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2550 Summary: Alphabet problems when adding sequences Product: Biopython Version: 1.47 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk #Create three sequences as Seq objects, >>> from Bio import Alphabet >>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> a = Seq("ACTG", Alphabet.generic_dna) >>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-")) >>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-")) >>> a Seq('ACTG', DNAAlphabet()) >>> b Seq('AC-TG', Gapped(DNAAlphabet(), '-')) >>> c Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-')) #Now try adding them together... >>> b+c Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-')) >>> a+b Traceback (most recent call last): File "", line 1, in ? File "/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line 77, in __add__ elif other.alphabet.contains(self.alphabet): File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line 95, in contains return other.gap_char == self.gap_char and \ AttributeError: DNAAlphabet instance has no attribute 'gap_char' I would expect to get: Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-')) Similar example, but using proteins >>> p = Seq("ACDEFG", Alphabet.generic_protein) >>> q = Seq("ACDEFG", IUPAC.protein) >>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*")) >>> p Seq('ACDEFG', ProteinAlphabet()) >>> q Seq('ACDEFG', IUPACProtein()) >>> r Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*')) #Now try adding these together... >>> p+q Seq('ACDEFGACDEFG', ProteinAlphabet()) >>> p+r Traceback (most recent call last): File "", line 1, in ? File "/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line 77, in __add__ elif other.alphabet.contains(self.alphabet): File "/home/maubp/lib/python2.4/site-packages/Bio/Alphabet/__init__.py", line 110, in contains return other.stop_symbol == self.stop_symbol and \ AttributeError: ProteinAlphabet instance has no attribute 'stop_symbol' Here is an example of a more reasonable failure, >>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-")) >>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.')) >>> c Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-')) >>> d Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.')) >>> c+d Traceback (most recent call last): File "", line 1, in ? File "/home/maubp/repository/biopython/build/lib.linux-x86_64-2.4/Bio/Seq.py", line 80, in __add__ raise TypeError, ("incompatable alphabets", str(self.alphabet), TypeError: ('incompatable alphabets', "Gapped(IUPACUnambiguousDNA(), '-')", "Gapped(IUPACUnambiguousDNA(), '.')") I am OK with this failing with a TypeError. However, one might argue that reverting to a generic DNA alphabet with no declared alphabet was desirable: Seq("AC-TGAC.TG", DNAAlphabet())) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 15:59:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 11:59:50 -0400 Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences In-Reply-To: Message-ID: <200807271559.m6RFxoej018165@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2550 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 11:59 EST ------- Trying to fix this by chaning the Alphabet and AlphabetEncoder classes' contains method only is nasty, and wouldn't cover situations like this: p = Seq("PKL-PAK", Gapped(generic_protein,"-")) q = Seq("ADKS*", HasStopCodon(generic_protein,"*")) where you might expect something like: p+q == Seq("PKL-PAKADKS*", HasStopCodon(Gapped(generic_protein,"-"),"*") Taken literally, neither of these two alphabets contains the other - but there is a fairly obvious consensus alphabet! I think the best solution would require changes to the Seq object's add method to pick a consensus alphabet in the non-simple cases where one alphabet is clearly a sub-set of the other. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 18:54:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 14:54:01 -0400 Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences In-Reply-To: Message-ID: <200807271854.m6RIs1wZ025718@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2550 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 14:54 EST ------- Created an attachment (id=977) --> (http://bugzilla.open-bio.org/attachment.cgi?id=977&action=view) Patch to Bio/Seq.py and Bio/Alphabet/__init__.py This uses some (private) alphabet functions in Bio/Alphabet/__init__.py (where I have already put a few bits extracted from or used by Bio.Align and Bio.AlignIO), and makes the old Alphabet .contains method effectively obsolete. Test case update to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 18:56:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 14:56:47 -0400 Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences In-Reply-To: Message-ID: <200807271856.m6RIulpl025828@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2550 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 14:56 EST ------- Created an attachment (id=978) --> (http://bugzilla.open-bio.org/attachment.cgi?id=978&action=view) Patches for test_seq.py and test_GACrossover.py Adds a new block of tests to test_seq.py to explicitly check a number of different alphabet combinations. Also tweaks test_GACrossover.py to define its test alphabet as a subclass of a suitable generic class in Bio.Alphabet, as otherwise it is not recognised as a valid alphabet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 27 19:06:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 27 Jul 2008 15:06:22 -0400 Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences In-Reply-To: Message-ID: <200807271906.m6RJ6MBk026364@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2550 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-27 15:06 EST ------- With the patch, repeating the example in my comment 0, >>> from Bio import Alphabet >>> from Bio.Alphabet import IUPAC >>> from Bio.Seq import Seq >>> a = Seq("ACTG", Alphabet.generic_dna) >>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-")) >>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-")) >>> a Seq('ACTG', DNAAlphabet()) >>> b Seq('AC-TG', Gapped(DNAAlphabet(), '-')) >>> c Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-')) >>> b+c Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-')) >>> a+b Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-')) >>> a+c Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-')) i.e. All the above additions work now. >>> p = Seq("ACDEFG", Alphabet.generic_protein) >>> q = Seq("ACDEFG", IUPAC.protein) >>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*")) >>> p Seq('ACDEFG', ProteinAlphabet()) >>> q Seq('ACDEFG', IUPACProtein()) >>> r Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*')) >>> p+q Seq('ACDEFGACDEFG', ProteinAlphabet()) >>> p+r Seq('ACDEFGACDEFG*', HasStopCodon(ProteinAlphabet(), '*')) These work too. >>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-")) >>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.')) >>> c Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-')) >>> d Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.')) >>> c+d Traceback (most recent call last): File "", line 1, in ? File "Bio/Seq.py", line 78, in __add__ a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet]) File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 199, in _consensus_alphabet raise ValueError("More than one gap character present") ValueError: More than one gap character present The error message has changed (and is more explicit), but I think this is a real failure case. Then based on the example in my comment 1, >>> p = Seq("PKL-PAK", Alphabet.Gapped(Alphabet.generic_protein,"-")) >>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*")) >>> p+q Seq('PKL-PAKADKS*', HasStopCodon(Gapped(ProteinAlphabet(), '-'), '*')) This works now too. One final example of a valid failure: >>> q = Seq("ADKS*", Alphabet.HasStopCodon(Alphabet.generic_protein,"*")) >>> r = Seq("SRFG@", Alphabet.HasStopCodon(Alphabet.generic_protein,"@")) >>> q+r Traceback (most recent call last): File "", line 1, in ? File "Bio/Seq.py", line 78, in __add__ a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet]) File "/home/maubp/repository/biopython/Bio/Alphabet/__init__.py", line 208, in _consensus_alphabet raise ValueError("More than one stop symbol present") ValueError: More than one stop symbol present I'd be grateful if anyone could test this, or comment on the code. While adding private functions to Bio.Alphabet is a reasonable short term solution (and means we can change arguments and names without breaking people's scripts!), some of this functionality might be best exposed publically. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:26:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:26:03 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200807280926.m6S9Q3Cn032456@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #943 is|0 |1 obsolete| | ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 05:26 EST ------- (From update of attachment 943) Checked in as part of Bio/Align/Generic.py revision 1.10 Adding __len__ would also be sensible, and perhaps __nonzero__ (which could check the number of rows AND columns). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:37:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:37:27 -0400 Subject: [Biopython-dev] [Bug 2551] New: Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2551 Summary: Adding advanced __getitem__ to generic alignment, e.g. align[1:2,5:-5] Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BugsThisDependsOn: 2507 I'm filing this as a separate sub-issue from Bug 1944. The idea is to enhance the minimal __getitem__ method now in CVS to allow accessing of rows (sequences), columns, or sub-alignments. A possible __getitem__ doc string: Depending on the indices, you can get a SeqRecord object (representing a single row), a Seq object (for a single columns), a string (for a single characters) or another alignment (representing some part or all of the alignment). align[r,c] gives a single character as a string align[r] gives a row as a SeqRecord align[r,:] gives a row as a SeqRecord align[:,c] gives a column as a Seq (using the alignment's alphabet) align[:] and align[:,:] give a copy of the alignment Anything else gives a sub alignment, e.g. align[0:2] or align[0:2,:] uses only row 0 and 1 align[:,1:3] uses only columns 1 and 2 align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2 Doing this nicely will build on adding annotation aware slicing support to the SeqRecord, which is Bug 2507. There is some __getitem__ code on Bug 1944 Attachment 732 and Bug 1944 Attachment 770. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:37:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:37:29 -0400 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200807280937.m6S9bTY8000615@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2551 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:48:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:48:56 -0400 Subject: [Biopython-dev] [Bug 2552] New: Adding alignments Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2552 Summary: Adding alignments Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This is related to the very broad alignment bug 1944. Given two alignments, it can make sense to talk about adding them together. However we can either add by row, or by column. e.g. Consider this alignment, a DNAAlphabet() alignment with 3 rows and 14 columns ACGATCAGCTAGCT Alpha CCGATCAGCTAGCT Beta ACGATGAGCTAGCT Gamma Doing a+a by column would give: DNAAlphabet() alignment with 3 rows and 28 columns ACGATCAGCTAGCTACGATCAGCTAGCT Alpha CCGATCAGCTAGCTCCGATCAGCTAGCT Beta ACGATGAGCTAGCTACGATGAGCTAGCT Gamma This sort of operation is often done to combined alignments from multiple genes (after first sorting the rows to ensure the species names are in the same order). To implement this would ideally require the ability to add SeqRecord objects together, doing something sensible with the annotation and in particular the identifies. Doing a+a by row would give: DNAAlphabet() alignment with 6 rows and 14 columns ACGATCAGCTAGCT Alpha CCGATCAGCTAGCT Beta ACGATGAGCTAGCT Gamma ACGATCAGCTAGCT Alpha CCGATCAGCTAGCT Beta ACGATGAGCTAGCT Gamma This particular example, a+a, is perhaps unrealistic due to the repeated identifiers, but I imagine there are some real use cases for this operation. More generally, suppose we have two alignments a and b. Treating each alignment as a list of SeqRecord objects, you might expect: a.extend(b) -> addition by row a+b -> addition by row However, I would suggest for alignment objects: a.extend(b) -> addition by row, requires sequence all be same length (same number of columns) a+b -> addition by column, requires same number of sequences (rows) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:53:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:53:34 -0400 Subject: [Biopython-dev] [Bug 2553] New: Adding SeqRecord objects to an alignment (append or extend) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2553 Summary: Adding SeqRecord objects to an alignment (append or extend) Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Currently a Bio.Align.Generic.Alignment object stores the rows as SeqRecord objects, but only exposes a public API for adding row sequences as strings. As suggested on Bug 1944, it would make sense to treat the Alignment as a list of SeqRecord objects and therefore support the list methods .append() and .extend() for the addition of more rows as SeqRecord objects. I would make the .append() method enforce the expectation that all rows are the same length, and that the new sequence's alphabet is compatible with the declared alignment alphabet. See also Bug 2552 - Adding alignments -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:57:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:57:52 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200807280957.m6S9vqJd001617@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn|2507 | Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 05:57 EST ------- I've filed bugs on what I think are the remaining issues raised here (Bug 1944), and am now closing this issue (as its getting very long and hard to follow): Bug 2551 - The __getitem__ method (accessing part of the alignment as an character string, row, column or sub-alignment). Bug 2552 - Adding alignments Bug 2553 - Adding SeqRecord objects to an alignment (append or extend) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 09:57:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 05:57:54 -0400 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200807280957.m6S9vspm001632@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO|1944 | nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 10:13:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 06:13:38 -0400 Subject: [Biopython-dev] [Bug 2554] New: Creating an Alignment from a list of SeqRecord objects Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2554 Summary: Creating an Alignment from a list of SeqRecord objects Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BugsThisDependsOn: 2553 It would be nice to be able to supply a list (or iterator) of SeqRecord objects when creating an alignment object. This would also make the Bio.SeqIO.to_alignment() function obsolete. Currently, the __init__ method takes just an alphabet: def __init__(self, alphabet): """Initialize a new Alignment object. Arguments: o alphabet - The alphabet to use for the sequence objects that are created. This alphabet must be a gapped type. """ #... My plan is to accept a list of SeqRecord objects (possibly empty) and an optional alphabet. If the alphabet is omitted, a consensus can be determined from the SeqRecord alphabets. This can be made backwards compatible: def __init__(self, records, alphabet=None): """Initialize a new Alignment object. Arguments: records - A list (or iterator) of SeqRecord objects, whose sequences are all the same length. This an be an empy list. alphabet - The alphabet for the whole alignment, typically a gapped alphabet, which should be a superset of the individual record alphabets. If ommited, a consensus alphabet is used. """ if not (isinstance(records, Alphabet.Alphabet) \ or isinstance(records, Alphabet.AlphabetEncoder)): if alphabet is None : #Backwards compatible mode! alphabet = records records = [] else : raise ValueError("Invalid records argument") #... I would expect the implementation to depend on Bug 2553 - Adding SeqRecord objects to an alignment (append or extend). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 10:13:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 06:13:41 -0400 Subject: [Biopython-dev] [Bug 2553] Adding SeqRecord objects to an alignment (append or extend) In-Reply-To: Message-ID: <200807281013.m6SADf6o002429@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2553 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2554 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 10:49:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 06:49:45 -0400 Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of SeqRecord objects In-Reply-To: Message-ID: <200807281049.m6SAnjbE003984@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2554 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 06:49 EST ------- There is an unwanted "not" in the code snippet in comment 0. Here is a preliminary implementation of the revised __init__ method plus append and extend (Bug 2533): def __init__(self, records, alphabet=None): """Initialize a new Alignment object. Arguments: records - A list (or iterator) of SeqRecord objects, whose sequences are all the same length. This an be an empty list. alphabet - The alphabet for the whole alignment, typically a gapped alphabet, which should be a super-set of the individual record alphabets. If omitted, a consensus alphabet is used. NOTE - Earlier versions of Biopython only accepted a single argument, an alphabet. This is still supported via a backwards compatible "hack" so as not to disrupt existing scripts and users. """ if isinstance(records, Alphabet.Alphabet) \ or isinstance(records, Alphabet.AlphabetEncoder): if alphabet is None : #Backwards compatible mode! alphabet = records records = [] else : raise ValueError("Invalid records argument") if alphabet is not None : if not (isinstance(alphabet, Alphabet.Alphabet) \ or isinstance(alphabet, Alphabet.AlphabetEncoder)): raise ValueError("Invalid alphabet argument") self._alphabet = alphabet else : #Default while we add sequences, will take a consensus later self._alphabet = Alphabet.single_letter_alphabet self._records = [] self.extend(records) if alphabet is None : #No alphabet was given, take a consensus alphabet #TODO - Use a generator expression once we drop python 2.3: self.alphabet = Alphabet._consensus_alphabet([rec.seq.alphabet for \ rec in self._records]) self._records = [] def extend(self, records) : """Add more SeqRecord objects to the alignment as rows. They must all have the same length as the original alignment, and have alphabets compatible with the alignment's alphabet.""" for rec in records : self.append(rec) def append(self, record) : """Add one more SeqRecord object to the alignment as a new row. This must have the same length as the original alignment (unless this is the first record), and have an alphabet compatible with the alignment's alphabet.""" if not isinstance(record, SeqRecord) : raise TypeError("New sequence is not a SeqRecord object") if self._records and len(record) <> self.get_alignment_length() : raise ValueError("New sequence is not of length %i" \ % self.get_alignment_length()) #Using not self._alphabet.contains(record.seq.alphabet) needs fixing #for AlphabetEncoders (e.g. gapped versus ungapped). if not Alphabet._are_alphabets_compatible(self._alphabet, \ record.seq.alphabet) : raise ValueError("New sequence's alphabet is incompatible") self._records.append(record) The unit tests look fine with this addition. Of course, new tests to verify this functionality explicitly should then be added (and I could take advantage of this in Bio.AlignIO too). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 28 10:54:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 28 Jul 2008 06:54:12 -0400 Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of SeqRecord objects In-Reply-To: Message-ID: <200807281054.m6SAsClZ004173@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2554 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-28 06:54 EST ------- Regarding the code in comment 1, the private function _are_alphabets_compatible() isn't in CVS, its something I was playing with on Bug 2550 - Alphabet problems when adding sequences. However, I hope that this conveys my overall intention for the Alignment object. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 30 02:22:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 29 Jul 2008 22:22:59 -0400 Subject: [Biopython-dev] [Bug 2557] New: AlignIO::write fails when delegating to SeqIO::write Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2557 Summary: AlignIO::write fails when delegating to SeqIO::write Product: Biopython Version: 1.47 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rsuri at cs.utexas.edu In line 185 of "biopython/Bio/AlignIO/__init__.py" in the current CVS version, there's a call to SeqIO::write with only 2 arguments instead of the required 3 ["SeqIO.write(alignment.get_all_seqs(), format)"] should be ["SeqIO.write(alignment.get_all_seqs(), handle, format)"] (i.e. pass the handle object). I know this happens when trying to output to FASTA format, and it appears to do so more generally whenever the SeqIO module can be used for output. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 30 02:36:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 29 Jul 2008 22:36:07 -0400 Subject: [Biopython-dev] [Bug 2558] New: AlignIO nexus parsing chokes on superfluous comma Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2558 Summary: AlignIO nexus parsing chokes on superfluous comma Product: Biopython Version: 1.47 Platform: All URL: http://www.cs.utexas.edu/~rsuri/M3579.NX OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rsuri at cs.utexas.edu The URL above points to a nexus file (also available from TreeBase with Matrix accession #M3579) that causes BioPython to raise an error when reading it with the AlignIO module. In the "Trees" section of the input file, the final taxon ("Lecanorales") has a trailing comma that causes BioPython to fail (search for the line beginning with "59"). I've verified that manually deleting the offending comma is a valid workaround. I don't know what the nexus format specification says, but this is poor form for BioPython, in my opinion. It seems reasonable enough to allow for some slack like this when reading formatted files. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 30 08:55:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 30 Jul 2008 04:55:42 -0400 Subject: [Biopython-dev] [Bug 2557] AlignIO::write fails when delegating to SeqIO::write In-Reply-To: Message-ID: <200807300855.m6U8tgLU019854@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2557 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-30 04:55 EST ------- That's embarrassing for me! Bug confirmed and fixed in CVS. I've used a very slightly simpler fix, taking advantage of the fact that you can iterate for the SeqRecords within an alignment: SeqIO.write(alignment, handle, format) I've also updated the Bio.AlignIO unit test to cover writing a couple of the formats supported via Bio.SeqIO ("fasta" and "tab"), although it might make sense to try all of them... Checking in Bio/AlignIO/__init__.py; /home/repository/biopython/biopython/Bio/AlignIO/__init__.py,v <-- __init__.py new revision: 1.10; previous revision: 1.9 done Checking in Tests/test_AlignIO.py; /home/repository/biopython/biopython/Tests/test_AlignIO.py,v <-- test_AlignIO.py new revision: 1.12; previous revision: 1.11 done Checking in Tests/output/test_AlignIO; /home/repository/biopython/biopython/Tests/output/test_AlignIO,v <-- test_AlignIO new revision: 1.10; previous revision: 1.9 done Thank you for reporting this oversight, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 30 09:23:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 30 Jul 2008 05:23:59 -0400 Subject: [Biopython-dev] [Bug 2558] Bio.Nexus chokes on TRANSLATE block with superfluous comma In-Reply-To: Message-ID: <200807300923.m6U9Nx8l021492@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2558 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|AlignIO nexus parsing chokes|Bio.Nexus chokes on |on superfluous comma |TRANSLATE block with | |superfluous comma ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-30 05:23 EST ------- This is an issue in the Bio.Nexus module, so its a job for Frank. Do you know if this affects all the NEXUS files from www.treebase.org? I've tried downloading several trees, but their FTP site is just timing out for me. According to http://www.treebase.org/treebase/submit.html the request trees be uploaded in the NEXUS file format so its possible that just a minority of their trees have this trailing comma. Note that this may be an invalid file (a TRANSLATE block with trailing comma), but as you say it looks relatively straight forward to cope with. However, I have had a quick look at the Bio.Nexus code, and I don't entirely understand what Frank's parser is doing here - so its not going to be a quick fix from me. Quick bit of python to show the stack trace: >>> from Bio.Nexus import Nexus >>> n = Nexus(open("M3579.NX")) Traceback (most recent call last): File "", line 1, in TypeError: 'module' object is not callable >>> n = Nexus.Nexus(open("M3579.NX")) Traceback (most recent call last): File "", line 1, in File "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", line 552, in __init__ self.read(input) File "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", line 614, in read self._parse_nexus_block(title, contents) File "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", line 655, in _parse_nexus_block getattr(self,'_'+line.command)(line.options) File "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", line 922, in _translate raise NexusError,'Format error in line %s.' % options Bio.Nexus.Nexus.NexusError: Format error in line 1 'Rolfidium_coccocarpioides', 2 'Mycoblastus_sanguinarius', 3 'Protoblastenia_rupestris', 4 'Myxobilimbia_sabuletorum', 5 'Byssoloma_leucoblepharum', 6 'Stereocaulon_tomentosum', 7 'Scoliciosporum_umbrinum', 8 'Haematomma_ochroleucum', 9 'Glyphopeltis_ligustica', 10 'Catinaria_atropurpurea', 11 'Miriquidica_garovaglii', 12 'Sphaerophorus_globosus', 13 'Lecidea_atrosanguinea', 14 'Cladonia_peziziformis', 15 'Stereocaulon_pileatum', 16 'Frutidella_caesioatra', 17 'Fellhanera_bouteillei', 18 'Tonina_cinereovirens', 19 'Helocarpon_crassipes', 20 'Micarea_alabastrites', 21 'Squamarina_lentigera', 22 'Lecanora_intumescens', 23 'Bellemerea_diamarta', 24 'Lopadium_disciforme', 25 'Herteliana_taylorii', 26 'Lepraria_lobificans', 27 'Psilolechia_leprosa', 28 'Protomicarea_limosa', 29 'Calopadia_foliicola', 30 'Fellhanera_subtilis', 31 'Pyrrhospora_quernea', 32 'Lecidella_meiococca', 33 'Hypogymnia_physodes', 34 'Ramalina_fastigiata', 35 'Halecania_alpivaga', 36 'Platismatia_glauca', 37 'Lepraria_bergensis', 38 'Micarea_micrococca', 39 'Lecania_atrynoides', 40 'Crocynia_gossypina', 41 'Psilolechia_lucida', 42 'Lecanora_allophana', 43 'Cladonia_digitata', 44 'Schadonia_fecunda', 45 'Psorula_rufonigra', 46 'Adelolecia_pilati', 47 'Lecidea_turgidula', 48 'Micarea_sylvicola', 49 'Lecidea_fuscoatra', 50 'Psora_rubiformis', 51 'Micarea_erratica', 52 'Megalaria_grossa', 53 'Lecidea_silacea', 54 'Micarea_intrusa', 55 'Psora_decipiens', 56 'Tephromela_atra', 57 'Bacidia_rosella', 58 'Micarea_adnata', 59 'Lecanorales',. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 30 12:57:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 30 Jul 2008 08:57:00 -0400 Subject: [Biopython-dev] [Bug 2558] Bio.Nexus chokes on TRANSLATE block with superfluous comma In-Reply-To: Message-ID: <200807301257.m6UCv0co031445@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2558 fkauff at biologie.uni-kl.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-30 08:57 EST ------- I'm all for a little bit of slack in parsers, but this looks in my opinion like a straightforward syntax error in the nexus file. I work with nexus files daily, and have never encountered such a trailing comma. What really confuses me is that there are 58 taxa in the data set, and no. 59 Lecanorales is in addition, with no data and no occurence in the tree. I don't think this is proper nexus format. Frank (In reply to comment #1) > This is an issue in the Bio.Nexus module, so its a job for Frank. > > Do you know if this affects all the NEXUS files from www.treebase.org? I've > tried downloading several trees, but their FTP site is just timing out for me. > According to http://www.treebase.org/treebase/submit.html the request trees be > uploaded in the NEXUS file format so its possible that just a minority of their > trees have this trailing comma. > > Note that this may be an invalid file (a TRANSLATE block with trailing comma), > but as you say it looks relatively straight forward to cope with. However, I > have had a quick look at the Bio.Nexus code, and I don't entirely understand > what Frank's parser is doing here - so its not going to be a quick fix from me. > > > Quick bit of python to show the stack trace: > > >>> from Bio.Nexus import Nexus > >>> n = Nexus(open("M3579.NX")) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'module' object is not callable > >>> n = Nexus.Nexus(open("M3579.NX")) > Traceback (most recent call last): > File "", line 1, in > File > "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", > line 552, in __init__ > self.read(input) > File > "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", > line 614, in read > self._parse_nexus_block(title, contents) > File > "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", > line 655, in _parse_nexus_block > getattr(self,'_'+line.command)(line.options) > File > "/Users/XXX/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Nexus/Nexus.py", > line 922, in _translate > raise NexusError,'Format error in line %s.' % options > Bio.Nexus.Nexus.NexusError: Format error in line 1 'Rolfidium_coccocarpioides', > 2 'Mycoblastus_sanguinarius', 3 'Protoblastenia_rupestris', 4 > 'Myxobilimbia_sabuletorum', 5 'Byssoloma_leucoblepharum', 6 > 'Stereocaulon_tomentosum', 7 'Scoliciosporum_umbrinum', 8 > 'Haematomma_ochroleucum', 9 'Glyphopeltis_ligustica', 10 > 'Catinaria_atropurpurea', 11 'Miriquidica_garovaglii', 12 > 'Sphaerophorus_globosus', 13 'Lecidea_atrosanguinea', 14 > 'Cladonia_peziziformis', 15 'Stereocaulon_pileatum', 16 > 'Frutidella_caesioatra', 17 'Fellhanera_bouteillei', 18 'Tonina_cinereovirens', > 19 'Helocarpon_crassipes', 20 'Micarea_alabastrites', 21 > 'Squamarina_lentigera', 22 'Lecanora_intumescens', 23 'Bellemerea_diamarta', 24 > 'Lopadium_disciforme', 25 'Herteliana_taylorii', 26 'Lepraria_lobificans', 27 > 'Psilolechia_leprosa', 28 'Protomicarea_limosa', 29 'Calopadia_foliicola', 30 > 'Fellhanera_subtilis', 31 'Pyrrhospora_quernea', 32 'Lecidella_meiococca', 33 > 'Hypogymnia_physodes', 34 'Ramalina_fastigiata', 35 'Halecania_alpivaga', 36 > 'Platismatia_glauca', 37 'Lepraria_bergensis', 38 'Micarea_micrococca', 39 > 'Lecania_atrynoides', 40 'Crocynia_gossypina', 41 'Psilolechia_lucida', 42 > 'Lecanora_allophana', 43 'Cladonia_digitata', 44 'Schadonia_fecunda', 45 > 'Psorula_rufonigra', 46 'Adelolecia_pilati', 47 'Lecidea_turgidula', 48 > 'Micarea_sylvicola', 49 'Lecidea_fuscoatra', 50 'Psora_rubiformis', 51 > 'Micarea_erratica', 52 'Megalaria_grossa', 53 'Lecidea_silacea', 54 > 'Micarea_intrusa', 55 'Psora_decipiens', 56 'Tephromela_atra', 57 > 'Bacidia_rosella', 58 'Micarea_adnata', 59 'Lecanorales',. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 31 15:58:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 Jul 2008 11:58:08 -0400 Subject: [Biopython-dev] [Bug 2560] New: Adding BLAST support to Bio.AlignIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2560 Summary: Adding BLAST support to Bio.AlignIO Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk I think it can sometimes be useful to regard a BLAST output file as a series of pairwise alignments - and therefore it makes sense to add it to Bio.AlignIO and another input file format. http://biopython.org/wiki/AlignIO Note that the AlignIO API will not allow any "clumping" of the pairwise alignments (or HSPs in Blast terminology) according to the query or the target sequence - you just get them all one after the other. I will attach a rough Bio/AlignIO/BlastIO.py file which attempts to mimic the naming conventions in the fasta-m10 parser. This currently using Bio.Blast to do the actual parsing, and then just using the Blast results to build alignment objects with two sequences each. I suggest using the format names "blast" and "blastxml" for the plain text and XML output formats following BioPerl (although I would prefer "blast-xml" to "blastxml"), see http://www.bioperl.org/wiki/HOWTO:SearchIO#Design -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 31 16:00:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 31 Jul 2008 12:00:23 -0400 Subject: [Biopython-dev] [Bug 2560] Adding BLAST support to Bio.AlignIO In-Reply-To: Message-ID: <200807311600.m6VG0NAq021299@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2560 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-31 12:00 EST ------- Created an attachment (id=980) --> (http://bugzilla.open-bio.org/attachment.cgi?id=980&action=view) New file Bio/AlignIO/BlastIO.py The included "self test" just parses all the unit tests (excluding the PSI-Blast and HTML files). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.