From bugzilla-daemon at portal.open-bio.org Tue Jul 1 04:36:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 04:36:33 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807010836.m618aXO8014712@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 fkauff at biologie.uni-kl.de changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #13 from fkauff at biologie.uni-kl.de 2008-07-01 04:36 EST ------- Just uploaded a new Nexus.py to CVS. First, the taxlabels command in a taxa block is now ignored. For a standard nexus file, taxon labels are in the matrix, and a taxon block is irrelevant. The only exception are transposed matrices, which are not supported by Nexus.py anyway. Without the added confusion of a separate taxlabels command, it is now fairly easy to deal with duplicate names. Both self.taxlabels and self.matrix now carry the same set of unique taxon names. All example files seem to work fine for me. unless I hear otherwise, I close this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 05:01:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 05:01:29 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807010901.m6191TxO015999@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 05:01 EST ------- Does this mean that there will be no way to see the original non-unique names from within Bio.Nexus? I agree they are a pain, but it would be nice to preserve them. I haven't read the Nexus specs (restricted article), but does this comment on the issue of repeated identifiers? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 05:13:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 05:13:02 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807010913.m619D2vK016454@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #15 from fkauff at biologie.uni-kl.de 2008-07-01 05:13 EST ------- Yes, the original non-unique names are currently not preserved. It would be fairly easy to keep them, if desired. The NEXUS specs (Maddison et al.) state that unique names "should be avoided if this might cause ambiguity", which imho they always do. But I experienced that sometimes names become identical due to truncation etc, so I needed a way to deal with it instead of just throwing an error. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 09:16:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 09:16:57 -0400 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200807011316.m61DGvGS029051@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #2 from fkauff at biologie.uni-kl.de 2008-07-01 09:16 EST ------- I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for everyone (instead creating their own uppercase-lowercase variants of those terribly complicated biopython alphabet classes), and easy to change for all other modules if lowercase-uppercase is what they want (or need). Nexus.py and Phd.py certainly need to allow lowercase characters, as this is very common. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 11:56:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 11:56:03 -0400 Subject: [Biopython-dev] [Bug 2533] New: Support for simple "tab" format in Bio.SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2533 Summary: Support for simple "tab" format in Bio.SeqIO Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Requested on the mailing list by Giovanni Marco Dall'Olio: http://lists.open-bio.org/pipermail/biopython/2008-June/004312.html See BioPerl: http://www.bioperl.org/wiki/Tab_sequence_format Suggested implementation to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 11:57:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 11:57:26 -0400 Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in Bio.SeqIO In-Reply-To: Message-ID: <200807011557.m61FvQN5006042@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2533 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 11:57 EST ------- Created an attachment (id=962) --> (http://bugzilla.open-bio.org/attachment.cgi?id=962&action=view) New file Bio/SeqIO/TabIO.py Treats the first field as the record's .id (and .name) Treats the second field as the record's sequence. When writing, uses only record.id and record.seq -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 1 12:00:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Jul 2008 12:00:59 -0400 Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in Bio.SeqIO In-Reply-To: Message-ID: <200807011600.m61G0xUp006217@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2533 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-01 12:00 EST ------- Created an attachment (id=963) --> (http://bugzilla.open-bio.org/attachment.cgi?id=963&action=view) Patch to add the "tab" format to Bio.SeqIO and update the unit test output The plumbing to make Bio.SeqIO (and Bio.AlignIO) aware of the new format. Adds the reader/writer mapping to Bio/SeqIO/__init__.py (trivial) and gives the updated output from test_SeqIO.py (trivial to regenerate with "python run_tests.py -g test_SeqIO.py"). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 2 06:33:35 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 11:33:35 +0100 Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez Message-ID: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> Hello Michiel et al., I've already added a few if statements to the end of Bio.Entrez._open() to catch a few errors I'd observed, and I've just found another example: >>> from Bio import Entrez >>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read() '\n' >>> Entrez.efetch("nucleotide", id="fiction").read() '\n' This seems to happen for any invalid identifier. Are you happy for me to check for this as an error too? Are there any valid reasons to get back an empty dataset like this? Also, I was wondering if we should raise a ValueError rather than IOError if we are fairly sure the problem is with the arguments rather than the network or the sever being unavailable. Peter From sdavis2 at mail.nih.gov Wed Jul 2 07:18:43 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 2 Jul 2008 07:18:43 -0400 Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez In-Reply-To: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> Message-ID: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com> On Wed, Jul 2, 2008 at 6:33 AM, Peter wrote: > Hello Michiel et al., > > I've already added a few if statements to the end of > Bio.Entrez._open() to catch a few errors I'd observed, and I've just > found another example: > >>>> from Bio import Entrez >>>> Entrez.efetch("nucleotide", id="fiction", rettype="fasta").read() > '\n' >>>> Entrez.efetch("nucleotide", id="fiction").read() > '\n' > > This seems to happen for any invalid identifier. Are you happy for me > to check for this as an error too? Are there any valid reasons to get > back an empty dataset like this? If the ability to use history is added, then an empty dataset could be a valid return after an empty search. For id-based-searches, I'm not sure I would raise an error for an empty set being returned anyway. Just my $0.02. Sean > Also, I was wondering if we should raise a ValueError rather than > IOError if we are fairly sure the problem is with the arguments rather > than the network or the sever being unavailable. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Wed Jul 2 07:34:32 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 12:34:32 +0100 Subject: [Biopython-dev] Catching more error conditions in Bio.Entrez In-Reply-To: <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com> References: <320fb6e00807020333n7902e452gac56e12f5d64a3ab@mail.gmail.com> <264855a00807020418qc858370r4083f0db9db3197a@mail.gmail.com> Message-ID: <320fb6e00807020434p474cect7a7b0d51148d7760@mail.gmail.com> >> This seems to happen for any invalid identifier. Are you happy for me >> to check for this as an error too? Are there any valid reasons to get >> back an empty dataset like this? > > If the ability to use history is added, then an empty dataset could be > a valid return after an empty search. ... Bio.Entrez has always supported the history, its just up to the user to take advantage of it. I've included an example in the tutorial to explain how to do this, cut and pasted below: from Bio import Entrez search_handle = Entrez.esearch(db="nucleotide",term="Opuntia and rpl16", usehistory="y", email="history.user at example.com") search_results = Entrez.read(search_handle) search_handle.close() gi_list = search_results["IdList"] count = int(search_results["Count"]) assert count == len(gi_list) session_cookie = search_results["WebEnv"] query_key = search_results["QueryKey"] #Now use the history session cookie and query key to download the results in batchs batch_size = 3 out_handle = open("orchid_rpl16.fasta", "w") for start in range(0,count,batch_size) : end = min(count, start+batch_size) print "Going to download record %i to %i" % (start+1, end) fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retstart=start, retmax=batch_size, webenv=session_cookie, query_key=query_key, email="history.user at example.com") data = fetch_handle.read() fetch_handle.close() out_handle.write(data) out_handle.close() Feedback on the tutorial or the example is of course welcome. > For id-based-searches, I'm not sure I would raise an error for an empty > set being returned anyway. > > Just my $0.02. I was wondering about this kind of thing... maybe some more testing of these kinds of examples would be in order. Peter From biopython at maubp.freeserve.co.uk Wed Jul 2 09:03:36 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 14:03:36 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO Message-ID: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com> Hi all, Do any of you have any comments or feedback on this suggested new "simple tab separated" format for Bio.SeqIO? To match BioPerl I plan on calling it the "tab" format - see below. Any real world example files would be good for the test suite. One nice thing is it adds another output format, something we're a bit short of in Bio.SeqIO with only fasta and some alignment formats (now handled via Bio.AlignIO, i.e. pfam/stockholm, clustal and phylip). Peter ---------- Forwarded message ---------- From: Peter Date: Tue, Jul 1, 2008 at 5:06 PM Subject: Re: [BioPython] Sequence from Fasta To: dalloliogm at gmail.com Cc: biopython at biopython.org Giovanni wrote: > yes, I think it will be useful to implement. > I know of people who have written a customized fasta2tab script and > use it quite frequently, so it would be good to support such a task. > As you said before this format is commonly used in combination with > grep/gawk scripts. I've gone for the simple option about how to parse the first field, its used as the record identifer (.id) and name only (nothing clever). Here is my suggested code, which you are welcome to download and try out. Bug 2533 - Support for simple "tab" format in Bio.SeqIO http://bugzilla.open-bio.org/show_bug.cgi?id=2533 If you want to try this yourself you'll need to download the new file TabIO.py into the Bio/SeqIO folder and update Bio/SeqIO/__init__.py to tell it about the new format (two new lines, see patch). Peter From biopython at maubp.freeserve.co.uk Wed Jul 2 09:21:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 14:21:29 +0100 Subject: [Biopython-dev] Questions about the NEXUS format Message-ID: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> Hello again Frank, As Biopython's NEXUS expect, I've got a couple of hopefully trivial questions about the format, which connect to how best to handle it the Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO http://biopython.org/wiki/AlignIO My short questions are: Q1: Can a file contain more than one NEXUS record (i.e. concatenation, with more than one #NEXUS line)? Q2: Can a NEXUS record/file contain more than one alignment (matrix block)? If the answer to either of those is a "yes", then any example files you could contribute would be very helpful. I have a more complicated question too, which would help me to resolve Bug 2227: http://bugzilla.open-bio.org/show_bug.cgi?id=2227 Q3: Given a generic Alignment object (e.g. from one of the other parsers), can I construct a corresponding Nexus object where the aligned sequences are used for the matrix? If so, how? Thank you, Peter From mjldehoon at yahoo.com Wed Jul 2 09:30:06 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Jul 2008 06:30:06 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics Message-ID: <29487.55988.qm@web62410.mail.re1.yahoo.com> Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format. In this format, each sequence has a name and comments, and in addition there can also be an overall comment to the file. Currently the parser in Bio.IntelliGenetics stores this information in Bio.IntelliGenetics.Record.Record objects (one record per sequence; the overall comment is inadvertently added to the first sequence in the file). I think it makes more sense to use a SeqRecord for that, and to deprecate Bio.IntelliGenetics.Record.Record. In that case, Bio.SeqIO looks like a more suitable place for this parser. The user would see something like this: >>> from Bio import SeqIO >>> handle = open("mydatafile.txt") >>> records = SeqIO.parse(handle, "ig") >>> records.comment "This is the overall comment" >>> for record in records: # ... record is a SeqRecord. Because of the overall comment, SeqIO.parse cannot simply return a generator function. It must return a full-fledged class, but one with an iterator. Any objections, anybody? --Michiel From biopython at maubp.freeserve.co.uk Wed Jul 2 09:48:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 14:48:31 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <29487.55988.qm@web62410.mail.re1.yahoo.com> References: <29487.55988.qm@web62410.mail.re1.yahoo.com> Message-ID: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> On Wed, Jul 2, 2008 at 2:30 PM, Michiel de Hoon wrote: > Bio.IntelliGenetics contains a parser for sequence data in the IntelliGenetics format. Just to be upfront, I'm not familiar with this format, but I've had a look at the examples. > In this format, each sequence has a name and comments, and in addition there can > also be an overall comment to the file. OK. This is also the case in other file formats, for example GenBank files can have free format text file header at the start but we ignore this. How would you separate the file header comment from the first record comment? Some files include what looks like a file header but the lines all seem to start with "; ". Maybe look for "; LOCUS..."? Given the whole comment seems to be free format I don't think this is very nice. On the other hand, some of the sample inputs includes a number of lines starting ";; Modified by ..." which would be easy to separate (one semi colon versus two semi colons). These are clearly file-level header lines, rather than being part of the first record. > Currently the parser in Bio.IntelliGenetics stores this information in > Bio.IntelliGenetics.Record.Record objects (one record per sequence; the > overall comment is inadvertently added to the first sequence in the file). I > think it makes more sense to use a SeqRecord for that, and to deprecate > Bio.IntelliGenetics.Record.Record. If all the data extracted by the Bio.IntelliGenetics parser could be dealt with using the SeqRecord parser added to Bio.SeqIO, then yes deprecating Bio.IntelliGenetics sounds fine. > In that case, Bio.SeqIO looks like a more suitable place for this parser. > The user would see something like this: >>>> from Bio import SeqIO >>>> handle = open("mydatafile.txt") >>>> records = SeqIO.parse(handle, "ig") >>>> records.comment > "This is the overall comment" >>>> for record in records: > # ... record is a SeqRecord. I see you are using "ig" as the format name, matching EMBOSS. Good :) http://emboss.sourceforge.net/docs/themes/seqformats/ig > Because of the overall comment, SeqIO.parse cannot simply return a > generator function. It must return a full-fledged class, but one with an iterator. Not necessarily. We can still use a simple generator function and either throw away the header comment, or included it with the first record (or even with every record). If you did create an iterator class, would you make the header available as a property of the iterator? Given the apparently fuzzy boundary between the file header and the first record header, I would just opt to treat it all as a comment for the first record. And use a simple generator function. Peter From fkauff at biologie.uni-kl.de Wed Jul 2 10:01:01 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Wed, 02 Jul 2008 16:01:01 +0200 Subject: [Biopython-dev] Questions about the NEXUS format In-Reply-To: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> Message-ID: <486B8A1D.8090806@biologie.uni-kl.de> Hi Peter, Peter wrote: > Hello again Frank, > > As Biopython's NEXUS expect, I've got a couple of hopefully trivial > questions about the format, which connect to how best to handle it the > Bio.SeqIO and Bio.AlignIO modules. http://biopython.org/wiki/SeqIO > http://biopython.org/wiki/AlignIO > > My short questions are: > > Q1: Can a file contain more than one NEXUS record (i.e. concatenation, > with more than one #NEXUS line)? > As far as I know: no. #NEXUS just indicates the file being a NEXUS file, the concept of "records" is not part of a nexus file > Q2: Can a NEXUS record/file contain more than one alignment (matrix block)? > > I just had a quick look at the old Maddison et al. introductory paper of Nexus, and it says that "although the nexus standard does not impose constraints on the number of blocks, particular programs will". I don't know of any program that would read more than one data block and keep both of them. > If the answer to either of those is a "yes", then any example files > you could contribute would be very helpful. > > I have a more complicated question too, which would help me to resolve Bug 2227: > http://bugzilla.open-bio.org/show_bug.cgi?id=2227 > > Q3: Given a generic Alignment object (e.g. from one of the other > parsers), can I construct a corresponding Nexus object where the > aligned sequences are used for the matrix? If so, how? > Hmmm - not really. Nexus.py does not support "empty" nexus class objects that could be filled with data (just tried) . But it would actually be a nice thing to have. I'll put this on my to do list. Cheers, Frank > Thank you, > > Peter > > ' From biopython at maubp.freeserve.co.uk Wed Jul 2 10:01:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:01:13 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> References: <29487.55988.qm@web62410.mail.re1.yahoo.com> <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> Message-ID: <320fb6e00807020701k2f5bf546j2d5ef3514a24e31a@mail.gmail.com> Hello again, Interestingly the IntelliGenetics looks the same as the MASE alignment file format: http://www.bioperl.org/wiki/Mase_multiple_alignment_format On the other hand, the EMBOSS example is clearly not a multiple sequence alignment: http://emboss.sourceforge.net/docs/themes/seqformats/ig Adding the parser to Bio.SeqIO would let us read in alignments too via Bio.AlignIO (which will offload the parsing to Bio.SeqIO and then try and convert the SeqRecords into an Alignment). Peter From biopython at maubp.freeserve.co.uk Wed Jul 2 10:06:40 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:06:40 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com> References: <29487.55988.qm@web62410.mail.re1.yahoo.com> <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com> Message-ID: <320fb6e00807020706l28309346m57e7bd884a0b7b9b@mail.gmail.com> Forgot to send this to the list, another point about IntelliGenetics vs MASE ---------- Forwarded message ---------- From: Peter Date: Wed, Jul 2, 2008 at 3:05 PM Subject: Re: [Biopython-dev] Bio.IntelliGenetics To: mjldehoon at yahoo.com > How would you separate the file header comment from the first record > comment? Some files include what looks like a file header but the > lines all seem to start with "; ". Maybe look for "; LOCUS..."? > Given the whole comment seems to be free format I don't think this is > very nice. > > On the other hand, some of the sample inputs includes a number of > lines starting ";; Modified by ..." which would be easy to separate > (one semi colon versus two semi colons). These are clearly file-level > header lines, rather than being part of the first record. I found an old link I had added on the wiki page for SeqIO development, http://pbil.univ-lyon1.fr/help/formats.html This clearly describes MASE format format s having (optional) header lines as starting with two semi colons. But are MASE and IntelliGenetics the same thing? Petet From biopython at maubp.freeserve.co.uk Wed Jul 2 10:12:48 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:12:48 +0100 Subject: [Biopython-dev] Questions about the NEXUS format In-Reply-To: <486B8A1D.8090806@biologie.uni-kl.de> References: <320fb6e00807020621v6370c556g966f01a857f5c4e3@mail.gmail.com> <486B8A1D.8090806@biologie.uni-kl.de> Message-ID: <320fb6e00807020712y54874e02k6854b92e1711358d@mail.gmail.com> >> My short questions are: >> >> Q1: Can a file contain more than one NEXUS record (i.e. concatenation, >> with more than one #NEXUS line)? > > As far as I know: no. #NEXUS just indicates the file being a NEXUS file, the > concept of "records" is not part of a nexus file OK, thank you. >> Q2: Can a NEXUS record/file contain more than one alignment (matrix >> block)? > > I just had a quick look at the old Maddison et al. introductory paper of > Nexus, and it says that "although the nexus standard does not impose > constraints on the number of blocks, particular programs will". I don't know > of any program that would read more than one data block and keep both of > them. So that is a "yes in theory", but it doesn't sound worth worrying about. >> Q3: Given a generic Alignment object (e.g. from one of the other >> parsers), can I construct a corresponding Nexus object where the >> aligned sequences are used for the matrix? If so, how? > > Hmmm - not really. Nexus.py does not support "empty" nexus class objects > that could be filled with data (just tried) . But it would actually be a > nice thing to have. I'll put this on my to do list. Thanks, Peter From mjldehoon at yahoo.com Wed Jul 2 10:15:16 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Jul 2008 07:15:16 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> Message-ID: <529945.38158.qm@web62404.mail.re1.yahoo.com> > On the other hand, some of the sample inputs includes a number of > lines starting ";; Modified by ..." which would be easy to separate > (one semi colon versus two semi colons). These are clearly file-level > header lines, rather than being part of the first record. According to the website mentioned in Bio/IntelliGenetics/__init__.py, the file-level comments have two semi colons, as opposed to the sequence-specific comments, which have one semi colon. http://pbil.univ-lyon1.fr/help/formats.html > If you did create an iterator class, would you make the > header available as a property of the iterator? I am not sure what you mean by a property of the iterator. I was thinking to simply add a field to the class. ---Michiel. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 10:38:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 10:38:52 -0400 Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like TextTools in run_tests.py In-Reply-To: Message-ID: <200807021438.m62Ecqma013815@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2524 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Documentation |Unit Tests ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:38 EST ------- Filing under "Unit Tests". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 10:39:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 10:39:22 -0400 Subject: [Biopython-dev] [Bug 2469] requires_wise.py fails on Windows (test suite) In-Reply-To: Message-ID: <200807021439.m62EdMM9013903@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2469 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Main Distribution |Unit Tests ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 10:39 EST ------- Filing under "Unit Tests" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Jul 2 10:56:00 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 2 Jul 2008 15:56:00 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <529945.38158.qm@web62404.mail.re1.yahoo.com> References: <320fb6e00807020648o27d8fc7ie924c6d08c6c0ef6@mail.gmail.com> <529945.38158.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00807020756r4de8ed4bi3f8b409d75996a14@mail.gmail.com> >> If you did create an iterator class, would you make the >> header available as a property of the iterator? > > I am not sure what you mean by a property of the iterator. I was > thinking to simply add a field to the class. Adding the file header field to the iterator class? You could do I suppose. Right now all the Bio.SeqIO parsers use generator functions (although not in AlignIO), although I have no objection to returning iterator classes instead. I don't really like the idea of Bio.SeqIO parsers returning anything other than SeqRecord objects - even if it is indirectly via a richer iterator object. I see the Bio.SeqIO as a common unified API, and the downside is sometimes extra data doesn't really fit. If there really is some important meta-data in a file format that applies to all the records, then it cannot easily be represented in the Bio.SeqIO system except as annotation added to every single SeqRecord. e.g. Add the header to the annotations dictionary under "file-header" or something. Peter From mjldehoon at yahoo.com Wed Jul 2 11:29:31 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Jul 2008 08:29:31 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020705qea5976j9a5e2cd0896f391d@mail.gmail.com> Message-ID: <318336.37817.qm@web62405.mail.re1.yahoo.com> --- On Wed, 7/2/08, Peter wrote: > I found an old link I had added on the wiki page for SeqIO > development, > http://pbil.univ-lyon1.fr/help/formats.html > > This clearly describes MASE format format s having > (optional) header > lines as starting with two semi colons. But are MASE and > IntelliGenetics the same thing? It may be that the link in Bio/IntelliGenetics/__init__.py actually does not pertain the the IntelliGenetics format. Except for this link (which as you point out actually talks about the MASE format, not the IntelliGenetics format), I have seen no description elsewhere of these file-wide comments preceded by a double semi-colon in the IntelliGenetics format. Even Biopython doesn't treat these consistently: The tests for Bio.IntelliGenetics include comments with the double semi-colon, but the parser doesn't treat them differently from sequence-specific comments. So let's do the following: For the IntelliGenetics parser, do not look for double semi-colon comments. Only check if the first character in a line is a semi-colon, and if so, treat it as a sequence-specific comment. This is what Bio.IntelliGenetics currently does anyway. Replace the parser class in Bio.IntelliGenetics by a generator function, and integrate it with Bio.SeqIO. Then, let's replace the IntelliGenetics tests by files that do not contain the double semi-colon comments. Does that sound OK? --Michiel. --Michiel. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 12:28:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 12:28:19 -0400 Subject: [Biopython-dev] [Bug 2535] New: Support for PIR / NBRF format in Bio.SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2535 Summary: Support for PIR / NBRF format in Bio.SeqIO Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BioPerl and EMBOSS both refer to this as the "pir" format, although EMBOSS also supports "nbrf" as an alternative. http://bioperl.org/wiki/PIR_sequence_format Patch to follow, a new parser and writer in plain python. The existing Martel based parser in Bio.NBRF could then be deprecated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 12:30:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 12:30:28 -0400 Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in Bio.SeqIO In-Reply-To: Message-ID: <200807021630.m62GUS5B025377@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2535 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 12:30 EST ------- Created an attachment (id=964) --> (http://bugzilla.open-bio.org/attachment.cgi?id=964&action=view) New file Bio/SeqIO/PirIO.py Note that the details of storing the sequence type may need tweaking for better agreement with the de-facto conventions from the GenBank parser. As part of this the following dictionary may be useful, from Bio/NBRF/ValSeq.py valid_sequence_dict = { "P1": "complete protein", "F1": "protein fragment", \ "DL": "linear DNA", "DC": "circular DNA", "RL": "linear RNA", \ "RC":"circular RNA", "N3": "transfer RNA", "N1": "other" } -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 13:37:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 13:37:05 -0400 Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in Bio.SeqIO In-Reply-To: Message-ID: <200807021737.m62Hb5lX031417@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2535 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-02 13:37 EST ------- My patch doesn't accept the "N1" sequence type mentioned in Bio/NBRF/ValSeq.py Also when recording a SeqRecord from a non-PIR input, we could try and guess the sequence type. The alphabet itself is one clue. GenBank and EMBL files should also record if the sequence is linear or circular, as well as a sequence type. For proteins, I don't see how to decide between P1 and F1 though (complete protein vs protein fragment). Maybe default to F1? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 2 15:51:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 2 Jul 2008 15:51:49 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807021951.m62Jpnx3012202@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 ------- Comment #9 from mmokrejs at ribosome.natur.cuni.cz 2008-07-02 15:51 EST ------- Even better docs: http://blog.doughellmann.com/2007/07/pymotw-subprocess.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 09:24:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 09:24:32 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807031324.m63DOWDA018278@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 09:24 EST ------- Hi Frank, I see you've updated Bio/Nexus/Nexus.py with CVS revision 1.16 to record the original taxon order with and without the name changes. n.unaltered_taxlabels = Original names in order with duplicates n.original_taxon_order = Modified names in order, suitable as keys to n.matrix I'll update Bio.SeqIO / Bio.AlignIO to take advantage of this shortly, storing the original name and the modified unique name as the SeqRecord's name and id properties. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 09:52:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 09:52:08 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807031352.m63Dq8el021720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #17 from fkauff at biologie.uni-kl.de 2008-07-03 09:52 EST ------- Hi Peter, I'd strongly suggest to use self.taxlabels instead of self.original_taxon_order. The latter is only for compatibility, and original_taxon_order just maps taxlabels. Actually it might make sense to give a deprecation warning if original_taxon_order is used, and it should be removed in some future release. Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 10:06:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 10:06:46 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807031406.m63E6kct023377@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #584 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:06 EST ------- (From update of attachment 584) With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an empty Nexus object and add sequences to it. This code it now obsolete. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 10:13:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 10:13:38 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807031413.m63EDcGj024034@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 10:13 EST ------- Created an attachment (id=965) --> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) Bio/Nexus/Nexus.py handle support in write_nexus_data() With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an empty Nexus object and add sequences to it: #Read in an alignment object, e.g. with Bio.AlignIO from Bio import AlignIO alignment = AlignIO.read(open("example.aln"), "clustal") #Make a Nexus object from Bio.Nexus import Nexus handle = open("test.nex", "w") n = Nexus.Nexus() n.alphabet = alignment._alphabet for record in alignment : n.add_sequence(record.id, record.seq.tostring()) n.write_nexus_data(handle) handle.close() There are two problems with write_nexus_data(), firstly it doesn't accept a StringIO handle (see also Bug 2454). Secondly, if given a handle it closes it. This would break the above code, or how I typically use StringIO. This patch addresses these points. Frank, are you happy for me to commit this change? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 12:02:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 12:02:30 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807031602.m63G2Unc032671@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:02 EST ------- Created an attachment (id=966) --> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view) Patch to Bio/AlignIO/NexusIO.py adding write support This patch requires the Bio.Nexus handle fix (patch in attachment 965, comment 4). My method for constructing an empty DNA, RNA, or Protein Nexus object is perhaps inelegant. This is required in order to setup the alphabet, ambiguous_values and unambiguous_letters properties which otherwise default to DNA. Also note that the Nexus add_sequence() method does not seem to support duplicated taxa names. Perhaps this method could update the unaltered_taxlabels property and use the _unique_label method to cope with duplicate names? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 3 12:08:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Jul 2008 12:08:26 -0400 Subject: [Biopython-dev] [Bug 2531] Nexus and fasta parsers have a problem with identical taxa names In-Reply-To: Message-ID: <200807031608.m63G8QS3000534@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2531 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-03 12:08 EST ------- I have changed my use of original_taxon_order to just taxlabels (code now in Bio/AlignIO/NexusIO.py rather than Bio/SeqIO/NexusIO.py). I agree, adding a deprecation warning to the original_taxon_order get/set functions would make sense. P.S. Thanks for adding the unaltered_taxlabels property. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Jul 4 04:11:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Jul 2008 09:11:06 +0100 Subject: [Biopython-dev] What happened to Biopython 1.46? Message-ID: <320fb6e00807040111h182411d5lea14575f2906e7ba@mail.gmail.com> We were recently talking about doing another release, but as you may have noticed nothing has been announced. Michiel devoted a good chunk of his weekend to preparing Biopython 1.46 and uploaded it to the servers on Sunday 29th. He didn't issue an announcement email at the time due to the problem with the wiki being read only (now fixed). However, on the Monday evening I realised I'd done something really stupid in Bio.Data.CodonTable just before the CVS freeze. Table 15 (Blepharisma Macronuclear) would be used whenever a translation table was requested by name. This change has been reverted, and I've added further translation checks in test_seq.py to avoid any similar issue in future. So, while there is a Biopython 1.46, we're not going to advertise it because the translation functionality is subtly wrong. However, it is up on the website, and linked to with an errata statement. Michiel will kindly try and prepare Biopython 1.47 soon... so please hold off any big changes in CVS until then. And I'm hearby publicly promising to treat him to dinner - hopefully we'll be in the same country at the same time this year! Peter From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:39:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 04:39:35 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040839.m648dZnX025882@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #6 from fkauff at biologie.uni-kl.de 2008-07-04 04:39 EST ------- (In reply to comment #4) > Created an attachment (id=965) --> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details] > Bio/Nexus/Nexus.py handle support in write_nexus_data() > > With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an > empty Nexus object and add sequences to it: > ... > There are two problems with write_nexus_data(), firstly it doesn't accept a > StringIO handle (see also Bug 2454). > > Secondly, if given a handle it closes it. This would break the above code, or > how I typically use StringIO. > > This patch addresses these points. > > Frank, are you happy for me to commit this change? > Very nice. Go for it :-) Cheers, Frank (In reply to comment #4) > Created an attachment (id=965) --> (http://bugzilla.open-bio.org/attachment.cgi?id=965&action=view) [details] > Bio/Nexus/Nexus.py handle support in write_nexus_data() > > With Bio/Nexus/Nexus.py revision 1.16 from CVS, it is now possible to create an > empty Nexus object and add sequences to it: > > #Read in an alignment object, e.g. with Bio.AlignIO > from Bio import AlignIO > alignment = AlignIO.read(open("example.aln"), "clustal") > > #Make a Nexus object > from Bio.Nexus import Nexus > handle = open("test.nex", "w") > n = Nexus.Nexus() > n.alphabet = alignment._alphabet > for record in alignment : > n.add_sequence(record.id, record.seq.tostring()) > n.write_nexus_data(handle) > handle.close() > > There are two problems with write_nexus_data(), firstly it doesn't accept a > StringIO handle (see also Bug 2454). > > Secondly, if given a handle it closes it. This would break the above code, or > how I typically use StringIO. > > This patch addresses these points. > > Frank, are you happy for me to commit this change? > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:53:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 04:53:10 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040853.m648rAHL026783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #965 is|0 |1 obsolete| | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:53 EST ------- (From update of attachment 965) > > This patch addresses these points. > > > > Frank, are you happy for me to commit this change? > > > > Very nice. Go for it :-) > Thanks Frank. Checking in Nexus.py; /home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py new revision: 1.17; previous revision: 1.16 done Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 04:56:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 04:56:10 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040856.m648uAAG027012@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #966 is|0 |1 obsolete| | ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-04 04:56 EST ------- (From update of attachment 966) There is slight problem with this patch on the alphabet selection (it uses "dna" when it should use "rna"). I postpone dealing with writing Nexus files in Bio.SeqIO / Bio.AlignIO until after the next Biopython release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 05:13:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 05:13:25 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040913.m649DPap027929@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #9 from fkauff at biologie.uni-kl.de 2008-07-04 05:13 EST ------- (In reply to comment #5) > Created an attachment (id=966) --> (http://bugzilla.open-bio.org/attachment.cgi?id=966&action=view) [details] > Patch to Bio/AlignIO/NexusIO.py adding write support > > This patch requires the Bio.Nexus handle fix (patch in attachment 965 [details], comment > 4). > > My method for constructing an empty DNA, RNA, or Protein Nexus object is > perhaps inelegant. This is required in order to setup the alphabet, > ambiguous_values and unambiguous_letters properties which otherwise default to > DNA. > > Also note that the Nexus add_sequence() method does not seem to support > duplicated taxa names. Perhaps this method could update the > unaltered_taxlabels property and use the _unique_label method to cope with > duplicate names? > Ok, I updated add_sequence and will commit the changes soon. F -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 4 05:20:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Jul 2008 05:20:07 -0400 Subject: [Biopython-dev] [Bug 2227] Writing Nexus files with Bio.SeqIO In-Reply-To: Message-ID: <200807040920.m649K7MI028352@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2227 ------- Comment #10 from fkauff at biologie.uni-kl.de 2008-07-04 05:20 EST ------- (In reply to comment #9) > > > > Also note that the Nexus add_sequence() method does not seem to support > > duplicated taxa names. Perhaps this method could update the > > unaltered_taxlabels property and use the _unique_label method to cope with > > duplicate names? > > > Ok, I updated add_sequence and will commit the changes soon. > Checking in biopython/Bio/Nexus/Nexus.py; /home/repository/biopython/biopython/Bio/Nexus/Nexus.py,v <-- Nexus.py new revision: 1.18; previous revision: 1.17 Frank -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Fri Jul 4 06:24:06 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 4 Jul 2008 03:24:06 -0700 (PDT) Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com> Message-ID: <36286.77119.qm@web62412.mail.re1.yahoo.com> > I'm assuming we'd put the new IntelliGenetics to > SeqRecord parser in Bio/SeqIO/IgIO.py (based on > the format name of "ig" used in EMBOSS). OK. > Would we then also deprecate Bio.IntelliGenetics? Yes. Otherwise, it's replicated functionality. > Do you want to make these changes, or should I? Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead. If you prefer me to do it, please let me know. > > Then, let's replace the IntelliGenetics tests by > files that do not contain the double > > semi-colon comments. > > Why not just leave the double colon lines alone? The parser > should be able to cope. In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but if we'd include them with the sequence-specific comments we'd be misrepresenting the file. --Michiel. --- On Wed, 7/2/08, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio.IntelliGenetics > To: mjldehoon at yahoo.com > Date: Wednesday, July 2, 2008, 12:11 PM > > It may be that the link in > Bio/IntelliGenetics/__init__.py actually does not pertain > to > > the IntelliGenetics format. Except for this link > (which as you point out actually talks > > about the MASE format, not the IntelliGenetics > format), I have seen no description > > elsewhere of these file-wide comments preceded by a > double semi-colon in the > > IntelliGenetics format. Even Biopython doesn't > treat these consistently: The tests > > for Bio.IntelliGenetics include comments with the > double semi-colon, but the parser > > doesn't treat them differently from > sequence-specific comments. > > Maybe we should ask BioPerl if they distinguish between the > IntelliGenetics and MASE formats? > > Looking back over the old mailing list, at the time they > did think the > two were the same: > http://lists.open-bio.org/pipermail/biopython-dev/2001-October/000626.html > > > So let's do the following: > > For the IntelliGenetics parser, do not look for double > semi-colon comments. Only check > > if the first character in a line is a semi-colon, and > if so, treat it as a sequence-specific > > comment. This is what Bio.IntelliGenetics currently > does anyway. > > Replace the parser class in Bio.IntelliGenetics by a > generator function, and integrate it with > > Bio.SeqIO. > > I'm assuming we'd put the new IntelliGenetics to > SeqRecord parser in > Bio/SeqIO/IgIO.py (based on the format name of > "ig" used in EMBOSS). > Would we then also deprecate Bio.IntelliGenetics? > > Do you want to make these changes, or should I? > > > Then, let's replace the IntelliGenetics tests by > files that do not contain the double > > semi-colon comments. > > Why not just leave the double colon lines alone? The parser > should be > able to cope. > > Peter From biopython at maubp.freeserve.co.uk Fri Jul 4 10:31:55 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Jul 2008 15:31:55 +0100 Subject: [Biopython-dev] Bio.IntelliGenetics In-Reply-To: <36286.77119.qm@web62412.mail.re1.yahoo.com> References: <320fb6e00807020911w2bec03a6w5ec8b50f60a50238@mail.gmail.com> <36286.77119.qm@web62412.mail.re1.yahoo.com> Message-ID: <320fb6e00807040731h787c66e6t10a4edd31dffdbc2@mail.gmail.com> >> Do you want to make these changes, or should I? > > Either way is fine with me. If you want to include this in Bio.SeqIO, go ahead. OK. I've added a simple parser to CVS as Bio/SeqIO/IgIO.py for IntelliGenetics/MASE files using the format name "ig" to match EMBOSS. The existing three sample files are now being used in test_SeqIO.py and one of them also in test_AlignIO.py as well. If anyone wants to scan over the code, I'd be delighted to have feedback. Adding support for writing these files should be easy. Do you think this is worth implementing? Before we deprecate Bio.IntelliGenetics I suggest we ask on the mailing list if anyone is using it. > In the example files in test/IntelliGenetics, lines with a ';;' clearly have a different interpretation > from the sequence-specific comments starting with ';'. I am fine with skipping the ';;' lines, but > if we'd include them with the sequence-specific comments we'd be misrepresenting the file. I am ignoring the ";;" lines at the start of the file. Peter From mjldehoon at yahoo.com Sat Jul 5 04:24:41 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Jul 2008 01:24:41 -0700 (PDT) Subject: [Biopython-dev] CVS freeze for release 1.47 Message-ID: <223850.14172.qm@web62404.mail.re1.yahoo.com> Hi everybody, I'll start on release 1.47 from now, so please don't make any commits to CVS until the release is out. Thanks! --Michiel. From mjldehoon at yahoo.com Sat Jul 5 20:00:17 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Jul 2008 17:00:17 -0700 (PDT) Subject: [Biopython-dev] Biopython release 1.47 Message-ID: <287726.364.qm@web62412.mail.re1.yahoo.com> We are pleased to announce the release of Biopython 1.47. This release includes a new Bio.AlignIO module, updates to Bio.Blast, parsers for NCBI's Entrez E-Utilities, numerous other code improvements and fixes, and an extended and updated documentation. In particular if you use Biopython to access NCBI's E-Utilities, we encourage you to download and install this release to ensure full compliance with NCBI's access rules. Source distributions and Windows installers are available from the Biopython website at http://biopython.org. My thanks to all code contributers who made this new release possible. --Michiel on behalf of the Biopython developers From sbassi at gmail.com Sun Jul 6 15:53:54 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Sun, 6 Jul 2008 16:53:54 -0300 Subject: [Biopython-dev] NCBIStandalon not compatible with previous versions, is this a bug? Message-ID: NCBIStandalone changed in 1.46 due to bug #2508. So this code that was working before, no longer works: result, err = NCBIStandalone.blastall(b_exe, "blastn", b_db, f_name, expectation=1e-10, descriptions=1) The error trace is: File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py", line 1986, in _security_check_parameters if ";" in value or "&&" in value : TypeError: argument of type 'float' is not iterable So I had to rewrite the code as: result, err = NCBIStandalone.blastall(b_exe, "blastn", b_db, f_name, expectation="1e-10", descriptions="1") The problem is the function "_security_check_parameters", that assumes that all arguments are strings. Proposed solutions: 1) Leave it as is (this is not a bug). Some tutorial should be changed (?) 2) Modify line 1986 from: if ";" in value or "&&" in value : To this: if ";" in value or "&&" in str(value) : From bugzilla-daemon at portal.open-bio.org Mon Jul 7 06:47:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 06:47:48 -0400 Subject: [Biopython-dev] [Bug 2447] EUtils cannot parse PubMed XML for ACS journals In-Reply-To: Message-ID: <200807071047.m67Almjb027271@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2447 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:47 EST ------- Using Biopython release 1.47; Bio.Entrez can parse the XML for this PMID: >>> from Bio import Entrez >>> PMID = "17238260" >>> handle = Entrez.efetch(db='pubmed', id=PMID, retmode='xml') >>> record = Entrez.read(handle) >>> Noel, can you use Bio.Entrez instead of Bio.EUtils? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 7 06:55:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 06:55:10 -0400 Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author names In-Reply-To: Message-ID: <200807071055.m67AtAWu027543@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2448 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-07-07 06:55 EST ------- Using Bio.Entrez in Biopython release 1.47: >>> from Bio import Entrez >>> handle = Entrez.efetch(db='pubmed', id=pmids, retmode='xml') >>> records = Entrez.read(handle) >>> records[0]['MedlineCitation']['Article']['AuthorList'] [{u'LastName': 'Matamala', u'Initials': 'AR', u'ForeName': 'Adelio R'}, {u'LastName': 'Almonacid', u'Initials': 'DE', u'ForeName': 'Daniel E'}, {u'LastName': 'Figueroa', u'Initials': 'MF', u'ForeName': 'Maximiliano F'}, {u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName': u'Jos\xe9'}, {u'LastName': 'Bunster', u'Initials': 'MC', u'ForeName': 'Marta C'}] Noel, is this sufficient for your needs? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 7 07:12:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 07:12:26 -0400 Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author names In-Reply-To: Message-ID: <200807071112.m67BCQAB028433@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2448 ------- Comment #3 from baoilleach at gmail.com 2008-07-07 07:12 EST ------- Thanks Michiel, but I found a workaround a day later so don't worry about me. I was just letting you know about the bug... Noel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Jul 7 09:07:24 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Jul 2008 14:07:24 +0100 Subject: [Biopython-dev] NCBIStandalon not compatible with previous versions, is this a bug? In-Reply-To: References: Message-ID: <320fb6e00807070607m2cee88b1n9b2b2194d96c3c12@mail.gmail.com> On Sun, Jul 6, 2008 at 8:53 PM, Sebastian Bassi wrote: > NCBIStandalone changed in 1.46 due to bug #2508. > So this code that was working before, no longer works: > > result, err = NCBIStandalone.blastall(b_exe, "blastn", > b_db, f_name, expectation=1e-10, descriptions=1) > > The error trace is: > > File "/mnt/hda2/bio/biopython-1.46/build/lib.linux-i686-2.5/Bio/Blast/NCBIStandalone.py", > line 1986, in _security_check_parameters > if ";" in value or "&&" in value : > TypeError: argument of type 'float' is not iterable > > So I had to rewrite the code as: > > result, err = NCBIStandalone.blastall(b_exe, "blastn", > b_db, f_name, expectation="1e-10", descriptions="1") > > The problem is the function "_security_check_parameters", that assumes > that all arguments are strings. > > Proposed solutions: > > 1) Leave it as is (this is not a bug). Some tutorial should be changed (?) > 2) Modify line 1986 from: > if ";" in value or "&&" in value : > To this: > if ";" in value or "&&" in str(value) : I would say its a bug, and casting into a string on line 1986 looks like the best fix. I won't be able to do this until tomorrow afternoon at the latest - if you could file a bug that would be helpful in case I forget ;) Thanks Peter From bugzilla-daemon at portal.open-bio.org Mon Jul 7 13:08:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Jul 2008 13:08:40 -0400 Subject: [Biopython-dev] [Bug 2538] New: _security_check_parameters assumes all arguments are strings Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2538 Summary: _security_check_parameters assumes all arguments are strings Product: Biopython Version: 1.46 Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com This code no longer works: result, err = NCBIStandalone.blastall(b_exe, "blastn", b_db, f_name, expectation=1e-10, descriptions=1) Because new _security_check_parameters function assumes all blastall parameters are string, but expectation and descriptions are float and int. Proposed fix: Modify line 1986 from: if ";" in value or "&&" in value : To this: if ";" in value or "&&" in str(value) : -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at gmail.com Mon Jul 7 16:30:14 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 7 Jul 2008 17:30:14 -0300 Subject: [Biopython-dev] Alignment problem. bug? Message-ID: I would like to confirm that this is a bug ot not. If I get confirmation, I would fill it in bugzilla. With this code: from Bio import Clustalw from Bio.Clustalw import MultipleAlignCL cline = MultipleAlignCL('foralig.txt') cline.set_output("alig.aln") alignment = Clustalw.do_alignment(cline) I get: Traceback (most recent call last): File "/mnt/hda2/py252/bin/ii.py", line 112, in alignment = Clustalw.do_alignment(cline) File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 125, in do_alignment return parse_file(out_file, alphabet) File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 47, in parse_file generic_alignment = AlignIO.read(handle, "clustal") File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py", line 299, in read first = iterator.next() File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py", line 169, in next raise ValueError("Could not parse line:\n%s" % line) ValueError: Could not parse line: I tested with Biopython 1.47 and 1.46 with the input file: http://www.pastecode.com.ar/f44f28b41 (download at http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41) The clustal program is running because I see in the disk its output (posted here: http://www.pastecode.com.ar/f275a5475). It seems it fails to parse it. I also tested in an older version (I guess it is 1.44) and it works OK. So I think the problem was introduced in 1.46. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From bugzilla-daemon at portal.open-bio.org Tue Jul 8 04:41:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Jul 2008 04:41:02 -0400 Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all arguments are strings In-Reply-To: Message-ID: <200807080841.m688f2VL020100@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2538 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:41 EST ------- Included a float in the unit test for _security_check_parameters() added in Bug 2508: Tests/test_NCBIStandalone.py revision: 1.15; Fixed the string assumption in: Bio/Blast/NCBIStandalone.py revision: 1.74; Note that in your suggested fix Sebastian, both the "in" expressions need casting to a string. Thanks for reporting this! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Jul 8 04:51:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 09:51:31 +0100 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: References: Message-ID: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote: > I would like to confirm that this is a bug ot not. If I get > confirmation, I would fill it in bugzilla. It does look like a bug to me... > With this code: > > from Bio import Clustalw > from Bio.Clustalw import MultipleAlignCL > > cline = MultipleAlignCL('foralig.txt') > cline.set_output("alig.aln") > alignment = Clustalw.do_alignment(cline) > > I get: > > Traceback (most recent call last): > File "/mnt/hda2/py252/bin/ii.py", line 112, in > alignment = Clustalw.do_alignment(cline) > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", > line 125, in do_alignment > return parse_file(out_file, alphabet) > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", > line 47, in parse_file > generic_alignment = AlignIO.read(handle, "clustal") > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/__init__.py", > line 299, in read > first = iterator.next() > File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/AlignIO/ClustalIO.py", > line 169, in next > raise ValueError("Could not parse line:\n%s" % line) > ValueError: Could not parse line: > > > I tested with Biopython 1.47 and 1.46 with the input file: > http://www.pastecode.com.ar/f44f28b41 (download at > http://www.pastecode.com.ar/pastebin.php?dl=f44f28b41) > The clustal program is running because I see in the disk its output > (posted here: http://www.pastecode.com.ar/f275a5475). It seems it > fails to parse it. > > I also tested in an older version (I guess it is 1.44) and it works > OK. So I think the problem was introduced in 1.46. For Biopython 1.46+ I switched the Bio.Clustalw parser to internally call Bio.AlignIO, so one thing you could try is reverting Bio/Clustalw/__init__.py to the older version (e.g. that shipped with Biopython 1.45). You haven't said which version of the ClustalW tool you are using - maybe 2.0? If so, there could be a subtle change in the output format since 1.83. If you could run the tool by hand and share the output that would be helpful to try and track down this issue. I don't seem to have any version of ClustalW installed on my current machine, so it will take me a little longer to reproduce this here. Could you file a bug please, and attach the example input and the output when run by hand at the command line. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Tue Jul 8 04:52:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Jul 2008 04:52:06 -0400 Subject: [Biopython-dev] [Bug 2538] _security_check_parameters assumes all arguments are strings In-Reply-To: Message-ID: <200807080852.m688q6Ce020588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2538 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-08 04:52 EST ------- Forgot to mark this as fixed - sorry for the extra email! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Jul 8 07:02:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 12:02:37 +0100 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> Message-ID: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> On Tue, Jul 8, 2008 at 9:51 AM, Peter wrote: > On Mon, Jul 7, 2008 at 9:30 PM, Sebastian Bassi wrote: >> I would like to confirm that this is a bug ot not. If I get >> confirmation, I would fill it in bugzilla. > > It does look like a bug to me... I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal files where the first line of the consensus was blank (and would probably affect both Clustal W 1.83 too). I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 Could you update this file and re-test please Sebastian? Also, may I add a test alignment file based on your example to CVS please? Thanks, Peter From mjldehoon at yahoo.com Tue Jul 8 08:47:48 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 8 Jul 2008 05:47:48 -0700 (PDT) Subject: [Biopython-dev] Bio.Sequencing Message-ID: <570915.67657.qm@web62415.mail.re1.yahoo.com> Hi everybody, Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps? I'd like to make some changes to Bio.Sequencing with regards to bug #2454: http://bugzilla.open-bio.org/show_bug.cgi?id=2454 Just to make sure that I am not treading on other people's work. --Michiel From fkauff at biologie.uni-kl.de Tue Jul 8 09:12:39 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Tue, 08 Jul 2008 15:12:39 +0200 Subject: [Biopython-dev] Bio.Sequencing In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com> References: <570915.67657.qm@web62415.mail.re1.yahoo.com> Message-ID: <487367C7.2050702@biologie.uni-kl.de> Hi all, Michiel de Hoon wrote: > Hi everybody, > > Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps? > Not me. Green lights from my side. Frank > I'd like to make some changes to Bio.Sequencing with regards to bug #2454: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2454 > > Just to make sure that I am not treading on other people's work. > > > --Michiel > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From biopython at maubp.freeserve.co.uk Tue Jul 8 10:36:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 15:36:43 +0100 Subject: [Biopython-dev] Bio.Sequencing In-Reply-To: <570915.67657.qm@web62415.mail.re1.yahoo.com> References: <570915.67657.qm@web62415.mail.re1.yahoo.com> Message-ID: <320fb6e00807080736v26388f2ake12303c5b752c5e9@mail.gmail.com> On Tue, Jul 8, 2008 at 1:47 PM, Michiel de Hoon wrote: > Hi everybody, > > Is somebody currently actively maintaining Bio.Sequencing? Frank perhaps? > I'd like to make some changes to Bio.Sequencing with regards to bug #2454: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2454 > > Just to make sure that I am not treading on other people's work. My only comment is watch out for the fact that Bio.SeqIO is now calling Bio.Sequencing for the "ace" and "phd" formats. On a related note, I'd had some ideas for making the Ace parser more user friendly by further extending the doc strings and defining __str__ or __repr__ methods for some of the "line type classes" which otherwise must be explored by using dir() to discover the properties. I haven't actually done any work on this yet though. Peter From sbassi at gmail.com Tue Jul 8 11:38:29 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Jul 2008 12:38:29 -0300 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> Message-ID: On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote: > I've reproduced this using Clustalw 2.0.8 (online at the EBI) and with > Clustalw 2.0.9 (installed locally). It was a problem parsing Clustal > files where the first line of the consensus was blank (and would > probably affect both Clustal W 1.83 too). Yes, I used ClustalW 1.83 > I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 > Could you update this file and re-test please Sebastian? Also, may I > add a test alignment file based on your example to CVS please? Ok, I will test it today. You can use my file or any possible derivation of it. Best, SB. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From biopython at maubp.freeserve.co.uk Tue Jul 8 11:56:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Jul 2008 16:56:20 +0100 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> Message-ID: <320fb6e00807080856s55d77962h9ceedd160ca8002b@mail.gmail.com> >> I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 >> Could you update this file and re-test please Sebastian? Also, may I >> add a test alignment file based on your example to CVS please? > > Ok, I will test it today. You can use my file or any possible derivation of it. Thanks - I have added a two sequence version of your example as Tests/Clustalw/odd_consensus.aln Peter From sbassi at gmail.com Tue Jul 8 12:52:13 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Jul 2008 13:52:13 -0300 Subject: [Biopython-dev] Alignment problem. bug? In-Reply-To: <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> References: <320fb6e00807080151m1a2c3932nfe8351569f0fa4e0@mail.gmail.com> <320fb6e00807080402g5b6fd74agff71dad10d08f306@mail.gmail.com> Message-ID: On Tue, Jul 8, 2008 at 8:02 AM, Peter wrote: > I think I have fixed this with Bio/AlignIO/ClustalIO.py revision: 1.12 Just to confirm that it works now. Thank you! Best, SB. From biopython at maubp.freeserve.co.uk Wed Jul 9 07:11:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Jul 2008 12:11:16 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO In-Reply-To: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com> References: <320fb6e00807020603s63f8339ag5b8140f1943ceb47@mail.gmail.com> Message-ID: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com> Now that Biopython 1.47 is out, are there any comments/objections to my committing this to CVS? Bug 2533 - Support for simple "tab" format in Bio.SeqIO http://bugzilla.open-bio.org/show_bug.cgi?id=2533 Thanks, Peter P.S. Any real world example files would be good for the test suite. From lpritc at scri.ac.uk Wed Jul 9 08:14:04 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Wed, 09 Jul 2008 13:14:04 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO In-Reply-To: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com> Message-ID: Only that you might want to consider Axon Text File format as a self-describing tab-separated format which would facilitate storage and recovery of all attributes of a sequence. There's a spec here: http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html On 09/07/2008 12:11, "Peter" wrote: > Now that Biopython 1.47 is out, are there any comments/objections to > my committing this to CVS? > > Bug 2533 - Support for simple "tab" format in Bio.SeqIO > http://bugzilla.open-bio.org/show_bug.cgi?id=2533 > > Thanks, > > Peter > > P.S. Any real world example files would be good for the test suite. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Dr Leighton Pritchard B.Sc.(Hons) MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). From biopython at maubp.freeserve.co.uk Wed Jul 9 08:30:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Jul 2008 13:30:26 +0100 Subject: [Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO In-Reply-To: References: <320fb6e00807090411o44530c46wc1ffdc8cdc5442fe@mail.gmail.com> Message-ID: <320fb6e00807090530j43a3e2c9y48bef4993587881f@mail.gmail.com> On Wed, Jul 9, 2008 at 1:14 PM, Leighton Pritchard wrote: > Only that you might want to consider Axon Text File format as a > self-describing tab-separated format which would facilitate storage and > recovery of all attributes of a sequence. There's a spec here: > > http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html > Its an interesting and flexible file format, but I don't see any standard column name for "sequence" which would be of particular interest from the point of view of the Bio.SeqIO module. If there is a de-facto convention for storing sequence data in an Axon Text File, then we could adopt this within Bio.SeqIO. Otherwise, I think any Axon Text File parser added to Biopython would have to be of much more general nature (and not part of Bio.SeqIO). Peter From biopython at maubp.freeserve.co.uk Wed Jul 9 09:03:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Jul 2008 14:03:16 +0100 Subject: [Biopython-dev] Simple __getitem__ for Alignments Message-ID: <320fb6e00807090603o6b087ceeuce0b87c13627552a@mail.gmail.com> Now that the latest release is out (Biopython 1.47), Bio.AlignIO should start to get used more. I anticipate more people getting frustrated with the current Alignment object, and would like to make another baby-step in improving it. I'd like to add a minimal __getitem__ method, as described in Bug 1944 comment 15, http://bugzilla.open-bio.org/show_bug.cgi?id=1944#c15 > def __getitem__(self, index) : > """Access part of the alignment. > > You can access a row of the alignment as a SeqRecord using an integer > index (think of the alignment as a list of SeqRecord objects here): > > first_record = my_alignment[0] > last_record = my_alignment[-1] > > Right now, this is the ONLY indexing operation supported. The > use of two indices and splice notation to extract a sub-alignment, > row, column or letter is under discussion for a future update.""" > if isinstance(index, int) : > #e.g. result = align[x] > #Return a SeqRecord > return self._records[index] > else : > raise TypeError, "Not currently supported, but may be in future." >From the discussion on Bug 1944, this doesn't seem to be contentious - the debate is about more advanced splicing operations. I'd also like to add a __len__ method which would return the number of SeqRecord objects (i.e. the number of rows). This would then let the alignment be treated very much like a read-only list of SeqRecord objects. Remember, we can already iterate over the rows in the alignment as SeqRecord objects. Any comments? Peter From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:21:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 09:21:13 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091321.m69DLD9g031282@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 09:21 EST ------- (In reply to comment #16) I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free to have a look and comment. If everybody is OK, I'll add a DeprecationWarning to the previous parser. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:37:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 09:37:44 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091337.m69DbiM5031944@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #21 from fkauff at biologie.uni-kl.de 2008-07-09 09:37 EST ------- Michiel, while you're at it - could you update my email in the source as well? And Cymon's email is now cy at cymon.org. Thanks! Frank (In reply to comment #20) > (In reply to comment #16) > I have uploaded an alternative parser in Bio.Sequencing.Phd to CVS. Feel free > to have a look and comment. If everybody is OK, I'll add a DeprecationWarning > to the previous parser. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 09:38:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 09:38:18 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091338.m69DcIDu031986@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 09:38 EST ------- In reply to comment 20 about the updates to Bio.Sequencing.PhD I see you've also update Bio.SeqIO.PhdIO in CVS (good). I would suggest you add yourself to the copyright statement for this module, and add some doc string entries to the new read and parse functions. I haven't looked over the details of the new code (other than confirming test_Phd.py and test_SeqIO.py seem happy). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 10:28:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 10:28:36 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807091428.m69ESaGm001621@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #23 from mdehoon at ims.u-tokyo.ac.jp 2008-07-09 10:28 EST ------- (In reply to comment #21) > Michiel, > > while you're at it - could you update my email in the source as well? And > Cymon's email is now I have updated your address, but I'd prefer hold off on Cymon's without his direct permission -- spammers are watching too, you know. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:33:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 14:33:42 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807091833.m69IXgcV013783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #10 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:33 EST ------- OK, so my old code not yet converted to biopython-1.47 gives me: _textframe = blast.blast_and_htmlize(_query_sequence, _usermode, upload_temp_path, blast_path, uri, _align_view, _matrix) File "/home/mmokrejs/public_html/IRES2/blast.py", line 548, in blast_and_htmlize _blast_out, _error_info, _blast_file = blastall(blast_path + targetdb, query_sequence, upload_temp_path, mode='sequence', align_view=align_view, matrix=matrix) File "/home/mmokrejs/public_html/IRES2/blast.py", line 506, in blastall _blast_out, _error_info = NCBIStandalone.blastall('/usr/bin/blastall', 'blastn', blast_db, _blast_file, matrix=matrix + ' -F 0', wordsize=_wordsize, gap_open=_gap_open, gap_extend=_gap_extend, strands=_strands, alignments=_alignments, descriptions=_descriptions, expectation=_expectation, align_view=align_view) File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1620, in blastall _security_check_parameters(keywds) File "/usr/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1986, in _security_check_parameters if ";" in value or "&&" in value : TypeError: argument of type 'int' is not iterable It turns out I am passing in: {'matrix': 'NUC.4.4 -F 0', 'strands': 3, 'expectation': 100, 'wordsize': 4, 'gap_extend': 1, 'gap_open': 1, 'alignments': 99999, 'descriptions': 9999} I don't think it makes sense to require users to pass strings instead of numbers to the function. While looking into the _security_check_parameters() I think you should also check for "||" - the logical OR as interpreted by shell and redirections ">" and "<". FIX: -if ";" in value or "&&" in value: +if ";" in str(value) or "&&" in str(value) or "||" in str(value) or ">" in str(value) or "<" in str(value): My apologies that I did not test earlier. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 14:38:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 14:38:08 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807091838.m69Ic82k014070@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 ------- Comment #11 from mmokrejs at ribosome.natur.cuni.cz 2008-07-09 14:38 EST ------- Don't know if you want to leave in the back-door to pass in another argument with its value. If not, prevent spaces as well. Values never contain spaces unless wrapped by single or double-quotes. I find it perfectly legal to tell blastall: -d "/some/db /another/db /yet/another" to search over three databases at once. It seems it does not reflect -d specified 3 times on its command-line. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 9 16:12:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Jul 2008 16:12:40 -0400 Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support for '-F F' and make it safe In-Reply-To: Message-ID: <200807092012.m69KCeO2018087@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2508 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-09 16:12 EST ------- The issue with non-string arguments (e.g. float or integers) was reported by by Sebastian Bassi (Bug 2538) and has since been fixed in CVS - sadly this was after the release of Biopython 1.47. As you've demonstrated there are valid reasons to want to include spaces. I would rather not add a check which requires lots of special casing. I'm leaving this bug open to consider extending _security_check_parameters() to prevent the use of pipes and redirection (i.e. "|", "<" and ">") which sounds reasonable. A third opinion wouldn't hurt of course! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 10 06:30:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Jul 2008 06:30:28 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807101030.m6AAUSew025300@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #24 from fkauff at biologie.uni-kl.de 2008-07-10 06:30 EST ------- > (In reply to comment #21) > > Michiel, > > > > while you're at it - could you update my email in the source as well? And > > Cymon's email is now > > I have updated your address, but I'd prefer hold off on Cymon's without his > direct permission -- spammers are watching too, you know. > Contacted Cymon, reply below: Hi Frank, ... > > Do you want your email address updated in the ace/phd parser code? Or > removed (just the email, not the name, of course)? Don't know if you follow > biopython-dev lately. I dont actually follow the -dev list but perhaps I should, as I think I'm going to be using and doing far more diverse bioinformatics stuff (now that I'm employed as a bioinformatician :) Anyway, the email can be changed to cymon.cox at gmail.com - best to go through google I think as their spam filters tend to be pretty good. Cheers, C. (In reply to comment #23) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 10 12:24:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Jul 2008 12:24:27 -0400 Subject: [Biopython-dev] [Bug 2533] Support for simple "tab" format in Bio.SeqIO In-Reply-To: Message-ID: <200807101624.m6AGORlL012526@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2533 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-10 12:24 EST ------- Checked in, marking as fixed. Bio/SeqIO/TabIO.py initial revision: 1.1 Bio/SeqIO/__init__.py new revision: 1.33 Tests/output/test_SeqIO new revision: 1.25 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 10 23:20:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Jul 2008 23:20:11 -0400 Subject: [Biopython-dev] [Bug 2542] New: AlignInfo.py fails a test Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2542 Summary: AlignInfo.py fails a test Product: Biopython Version: 1.46 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com When I run: $ python2.5 /mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py I get the first 2 test OK but then: Traceback (most recent call last): File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 723, in print summary.information_content() File "/mnt/hda2/bio/biopython-1.47/Bio/Align/AlignInfo.py", line 508, in information_content raise ValueError, errstr ValueError: Error in alphabet: not Nucleotide or Protein, supply expected frequencies I've also tried without the AlignIO: from Bio import Alphabet from Bio.Align.Generic import Alignment from Bio.Seq import Seq from Bio.Align.AlignInfo import SummaryInfo seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW' seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW' a = Alignment(Alphabet.ProteinAlphabet) a.add_sequence("asp",seq1) a.add_sequence("unk",seq2) summary = SummaryInfo(a) summary.information_content() Traceback (most recent call last): File "/mnt/hda2/py252/bin/align.py", line 16, in summary.information_content() File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py", line 508, in information_content raise ValueError, errstr ValueError: Error in alphabet: not Nucleotide or Protein, supply expected frequencies -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 04:49:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 04:49:08 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807110849.m6B8n8Xg022720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 04:49 EST ------- Going over your example code: >>> from Bio import Alphabet >>> from Bio.Align.Generic import Alignment >>> from Bio.Align.AlignInfo import SummaryInfo >>> seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW' >>> seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW' >>> a = Alignment(Alphabet.ProteinAlphabet) First problem, you gave the Alignment object an Alphabet class, rather than an instance of the class. I guess we should an explicit check to the Alignment object... You should have used: >>> a = Alignment(Alphabet.ProteinAlphabet()) Or, if you prefer perhaps: >>> a = Alignment(Alphabet.generic_protein) Then when we get to the information_content, there is another issue: >>> a.add_sequence("asp",seq1) >>> a.add_sequence("unk",seq2) >>> summary = SummaryInfo(a) >>> summary.information_content() Traceback (most recent call last): ... AttributeError: ProteinAlphabet instance has no attribute 'gap_char' The trouble here is that SummaryInfo class is looking for a declared gap character in the protein alphabet - and none has been declared. Your example sequences appear to use "-" as a gap, but you haven't declared this. Try this: from Bio import Alphabet from Bio.Align.Generic import Alignment from Bio.Seq import Seq from Bio.Align.AlignInfo import SummaryInfo seq1 = 'MHQAIFIYQIGYPLKSGYIQSIRSPEYDNW' seq2 = 'MH--IFIYQIGYALKSGYIQSIRSPEY-NW' a = Alignment(Alphabet.Gapped(Alphabet.generic_protein, "-")) a.add_sequence("asp",seq1) a.add_sequence("unk",seq2) summary = SummaryInfo(a) print summary.information_content() You mentioned having a similar issue with Bio.AlignIO - could you attached the example file to this bug with some trivial code showing your problem? Thanks, Peter. P.S. Please update to Biopython 1.47 rather than using 1.46 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 05:50:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 05:50:49 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807110950.m6B9on7t025902@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 05:50 EST ------- I think I've fixed the "Quick test" failure when running Bio/Align/AlignInfo.py directly. I don't know how I missed that before... /home/repository/biopython/biopython/Bio/Align/AlignInfo.py,v <-- AlignInfo.py new revision: 1.15; previous revision: 1.14 done My opinion from from looking at the AlignInfo code, and scanning back over the CVS history, is that it was ever used much with generic alphabets (as tend to be returned by Bio.AlignIO). There may be other issues here - for example I've spotted another problem case, doubly extended alphabets like a protein alphabet with declared Gapped and WithStopCodon (which you *might* want in an alignment). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Jul 11 06:33:22 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 11 Jul 2008 11:33:22 +0100 Subject: [Biopython-dev] Checking alphabet argument in alignments Message-ID: <320fb6e00807110333r1938510bne7e24d1ce7b5c0b@mail.gmail.com> I'd like to add the following check to the __init__ method of the Bio.Align.Generic.Alignment object (our base alignment class), > if not (isinstance(alphabet, Alphabet.Alphabet) \ > or isinstance(alphabet, Alphabet.AlphabetEncoder)): > raise ValueError("Invalid alphabet argument") This will prevent subtle user errors like this: from Bio import Alphabet from Bio.Align.Generic import Alignment a = Alignment(Alphabet.ProteinAlphabet) which should be: from Bio import Alphabet from Bio.Align.Generic import Alignment a = Alignment(Alphabet.ProteinAlphabet()) The only downside I have thought of is if anyone has created their own alignment class which does NOT subclass the original Bio.Alphabet.Alphabet class. This same test could (should?) also be added to the Seq and MutableSeq objects. What do people think? Peter From bugzilla-daemon at portal.open-bio.org Fri Jul 11 06:39:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 06:39:48 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807111039.m6BAdm05028072@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 06:39 EST ------- In comment 2 I wrote: > I've spotted another problem case, doubly extended alphabets like a > protein alphabet declared Gapped and WithStopCodon (which you *might* > want in an alignment). This alphabet issue is fixed in CVS, as is another corner case of a divide by zero error where an entire column consists of ignored characters. Please re-test with Bio/Align/AlignInfo.py revision 1.16 from CVS. Thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 12:18:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 12:18:28 -0400 Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects In-Reply-To: Message-ID: <200807111618.m6BGISQ3013553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2454 ------- Comment #25 from mdehoon at ims.u-tokyo.ac.jp 2008-07-11 12:18 EST ------- (In reply to comment #24) OK, I updated Phd.py. The last module to consider is Ace.py; I'll upload a fixed version soon. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:00:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:00:10 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807112100.m6BL0Aer026629@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #4 from sbassi at gmail.com 2008-07-11 17:00 EST ------- (In reply to comment #1) > First problem, you gave the Alignment object an Alphabet class, rather than an > instance of the class. I guess we should an explicit check to the Alignment > object... Yes, that is my fault. > You mentioned having a similar issue with Bio.AlignIO - could you attached the > example file to this bug with some trivial code showing your problem? Yes, this code with Bio.AlignIO also fails (I tried right now with AlignInfo.py rev. 1.17): from Bio.Align import AlignInfo from Bio.Align.AlignInfo import SummaryInfo from Bio import AlignIO fn = open("secu3.aln") alignment = AlignIO.read(fn, "clustal") summary = SummaryInfo(alignment) print summary.information_content() And I got (and this time I am not supplying any alphabet, at least not explicit): Traceback (most recent call last): File "/mnt/hda2/py252/bin/2542.py", line 12, in print summary.information_content() File "/mnt/hda2/py252/lib/python2.5/site-packages/Bio/Align/AlignInfo.py", line 499, in information_content raise ValueError, errstr ValueError: Error in alphabet: not Nucleotide or Protein, supply expected frequencies > P.S. Please update to Biopython 1.47 rather than using 1.46 I was using Biopython 1.47, but I reported as 1.46 just because 1.47 it is not available from the drop-down menu in bugzilla form. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:02:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:02:24 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807112102.m6BL2OvF026827@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 ------- Comment #5 from sbassi at gmail.com 2008-07-11 17:02 EST ------- Created an attachment (id=971) --> (http://bugzilla.open-bio.org/attachment.cgi?id=971&action=view) This file is used by my example were information_content() fails when sequences retrieved with AlignIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:16:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:16:03 -0400 Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and Bio.AlignIO In-Reply-To: Message-ID: <200807112116.m6BLG3SJ027522@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2443 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Specifying the alphabet in |Specifying the alphabet in |Bio.SeqIO.parse() |Bio.SeqIO and Bio.AlignIO ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:16 EST ------- I'm broadening the scope of this enhancement bug to cover Bio.SeqIO and Bio.AlignIO (both their read() and parse() functions). See also alphabet issues raised on Bug 2542. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 11 17:19:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 11 Jul 2008 17:19:50 -0400 Subject: [Biopython-dev] [Bug 2542] AlignInfo.py fails a test In-Reply-To: Message-ID: <200807112119.m6BLJoTt027660@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2542 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-11 17:19 EST ------- > Yes, this code with Bio.AlignIO also fails (I tried right now with > AlignInfo.py rev. 1.17): > > from Bio.Align import AlignInfo > from Bio.Align.AlignInfo import SummaryInfo > from Bio import AlignIO > fn = open("secu3.aln") > alignment = AlignIO.read(fn, "clustal") > summary = SummaryInfo(alignment) > print summary.information_content() > > And I got (and this time I am not supplying any alphabet, at least not > explicit): > > Traceback (most recent call last): > ... > ValueError: Error in alphabet: not Nucleotide or Protein, supply expected > frequencies Good. That seems to be working as intended - alignment formats like FASTA or Clustal do not specify the sequence type (unlike for example the Nexus format). Perhaps Bio.AlignIO.read() and parse() should be able to accept an optional alphabet argument? I had already been considering this for Bio.SeqIO so this is a natural extension. See Bug 2443. Unless information_content() can determine the sequence type (protein or nucleotide) from the alignment alphabet, you have to help it by supplying an appropriate e_freq_table argument. Perhaps: from Bio.Alphabet import IUPAC from Bio.SubsMat import FreqTable from Bio.Align.AlignInfo import SummaryInfo from Bio import AlignIO fn = open("secu3.aln") alignment = AlignIO.read(fn, "clustal") summary = SummaryInfo(alignment) #Have a generic alphabet, without a declared gap char, so must #provide the frequencies and chars to ignore explicitly: expected = FreqTable.FreqTable({"A":0.25,"G":0.25,"T":0.25,"C":0.25}, FreqTable.FREQ, IUPAC.unambiguous_dna) print summary.information_content(e_freq_table=expected, chars_to_ignore=['-']) This is probably safest. I'm doubtful that information_content() will choose wisely if given mixed case or lower case sequences... if that is the case it should be filed as a new bug. > > > P.S. Please update to Biopython 1.47 rather than using 1.46 > > I was using Biopython 1.47, but I reported as 1.46 just because 1.47 > it is not available from the drop-down menu in bugzilla form. Thanks for the reminder - I've added that to Bugzilla now :) I'm marking this bug as fixed now (after the updates to AlignInfo.py) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From peter at maubp.freeserve.co.uk Sat Jul 12 09:45:46 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Sat, 12 Jul 2008 14:45:46 +0100 Subject: [Biopython-dev] Deprecating the HTML parser in Bio.Blast.NCBIWWW Message-ID: <320fb6e00807120645u26321d71q30f72ed5808f700@mail.gmail.com> For some time now we've been discouraging the use of the HTML and plain text Blast parsers in favour of the XML format. I think it would be a good idea to now officially deprecate the HTML parser in Bio.Blast.NCBIWWW with warning messages when it is used. I don't even know if it still works with the recent big revision to the BLAST webpages, but I suspect not. However, the plain text parser in Bio.Blast.NCBIStandalone still has its uses. In particular, right now the PSI-BLAST output in XML format lacks some of the information found in the plain text output (new vs reused entries) so it would be premature to deprecate our plain text PSI parser. See Bug 2502 for details: http://bugzilla.open-bio.org/show_bug.cgi?id=2502#c18 Peter From bugzilla-daemon at portal.open-bio.org Sun Jul 13 12:23:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 13 Jul 2008 12:23:57 -0400 Subject: [Biopython-dev] [Bug 2543] New: Bio.Nexus.Trees can't handle named ancestors Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2543 Summary: Bio.Nexus.Trees can't handle named ancestors Product: Biopython Version: 1.46 Platform: PC OS/Version: FreeBSD Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: markd at soe.ucsc.edu The following code produces: ValueError: invalid literal for float(): Ancestor1 from Bio.Nexus import Trees # from http://evolution.genetics.washington.edu/phylip/newicktree.html treeStr = "(B:6.0,(A:5.0,C:3.0,E:4.0)Ancestor1:5.0,D:11.0);" tree = Trees.Tree(treeStr) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jul 14 06:17:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 14 Jul 2008 06:17:14 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200807141017.m6EAHEhg019686@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-14 06:17 EST ------- This sounds like a job for Frank (the Bio.Nexus module author). Can I ask if you've actually come across trees with names ancestor nodes in "real life"? That would make this bug more important. If so, the name of the tool would be interesting, an example tree file would be great to add to Biopython as a test case. If on the other hand the only named ancestor tree you've ever tried is the example from the Newick documentation, this doesn't seem such a high priority (but still worth fixing). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 15 16:07:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Jul 2008 16:07:56 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200807152007.m6FK7umn009526@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-15 16:07 EST ------- This is a suggested implementation of the split method for our Seq object, modelled after the python string method which it calls internall. Note that I have made the separator non-optional on the grounds that the string method's default of white space isn't (usually) sensible for sequences. I'm happy to change this if people this its better to be as close as possible to the string method. def split(self, sep, maxsplit=None) : """Split method, like that of a python string. Return a list of the 'words' in the string (as Seq objects), using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. Unlike the python string method, sep must be specified (as there shouldn't be any whitespace strings in a sequence). e.g. print my_seq.split("-") """ if maxsplit : parts = self.data.split(sep, maxsplit) else : parts = self.data.split(sep) return [Seq(chunk, self.alphabet) for chunk in parts] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 05:39:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 05:39:01 -0400 Subject: [Biopython-dev] [Bug 2544] New: Bio.SeqIO improvements Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2544 Summary: Bio.SeqIO improvements Product: Biopython Version: 1.47 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mmokrejs at ribosome.natur.cuni.cz $ python Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24) [GCC 4.3.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import SeqIO >>> handle = open("genbank-synthetic.gb") >>> print seq_record ID: EF452680.2 Name: EF452680 Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds. /comment=On Feb 4, 2008 this sequence version replaced gi:145391444. /sequence_version=2 /source=synthetic construct /taxonomy=['other sequences', 'artificial sequences'] /keywords=[''] /references=[, , , ] /accessions=['EF452680'] /data_file_division=SYN /date=11-JUN-2008 /organism=synthetic construct /gi=166831528 Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC', IUPACAmbiguousDNA()) >>> I do not see how I could access the value 'DNA' from the LOCUS line: LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008 No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0]. Could seq_record.features have a repr() function to give me something useful instead of this? >>> print seq_record.features [, , ] >>> I don't see documented anywhere in the biopython docs access the features, pasting something like the following into docs would give a user clue where to look for for values: >>> print seq_record.features[0].qualifiers {'db_xref': ['taxon:32630'], 'mol_type': ['other DNA'], 'organism': ['synthetic construct'], 'chromosome': ['Ib'], 'PCR_primers': ['fwd_seq: aggcctctgcttgccgtttgtttcg, rev_seq: cgccggcggcacacgctcaactaattac']} >>> print seq_record.features[1].qualifiers {'gene': ['NOS']} >>> print seq_record.features[2].qualifiers {'product': ['nitric oxide synthase'], 'codon_start': ['2'], 'EC_number': ['1.14.13.39'], 'transl_table': ['11'], 'note': ['derived from Toxoplasma gondii'], 'db_xref': ['GI:166831529'], 'translation': ['RPLLAVCFVSDFYSLSLLHFASVPFHESDGCVGRSHWLPGKHANYVKPAGARKRPEVGCRSSCLLRSVCCDILSPVRTRGN'], 'gene': ['NOS'], 'protein_id': ['ABP65329.2']} >>> print seq_record.features[3].qualifiers Traceback (most recent call last): File "", line 1, in IndexError: list index out of range >>> I wonder if I could access the above dicts as seq_record.features['source'] or seq_record.features['CDS']. Where is the 'source', 'gene', 'CDS' gone? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jul 16 06:30:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 16 Jul 2008 06:30:21 -0400 Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements In-Reply-To: Message-ID: <200807161030.m6GAUL9x017920@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2544 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Bio.SeqIO improvements |Bio.GenBank and SeqFeature | |improvements ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-07-16 06:30 EST ------- (In reply to comment #0) > $ python > > Python 2.5.2 (r252:60911, Jul 2 2008, 22:55:24) > [GCC 4.3.1] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Bio import SeqIO > >>> handle = open("genbank-synthetic.gb") I'm guessing the missing line here was something like: seq_record = SeqIO.read(handle, "genbank") > >>> print seq_record > ID: EF452680.2 > Name: EF452680 > Description: Synthetic construct nitric oxide synthase (NOS) gene, partial cds. > /comment=On Feb 4, 2008 this sequence version replaced gi:145391444. > /sequence_version=2 > /source=synthetic construct > /taxonomy=['other sequences', 'artificial sequences'] > /keywords=[''] > /references=[, > , instance at 0x834ceac>, ] > /accessions=['EF452680'] > /data_file_division=SYN > /date=11-JUN-2008 > /organism=synthetic construct > /gi=166831528 > Seq('TAGGCCTCTGCTTGCCGTTTGTTTCGTCAGCGATTTTTATAGTCTCAGCCTCCT...GCC', > IUPACAmbiguousDNA()) > >>> > > > I do not see how I could access the value 'DNA' from the LOCUS line: > LOCUS EF452680 260 bp DNA linear SYN 11-JUN-2008 Currently the sequence type (DNA, RNA, Protein) is used internally by the GenBank parser to determine the alphabet. It is not currently recorded in the SeqRecord object's annotation but could be. How about something like this?: seq_record.annotations["seq_type"] > No, I do not want to read seq_record.features[0].qualifiers['mol_type'][0]. Assuming that the first feature is the source (typically the case), and assuming it has a specified molecule type, then your suggestion is one work around. But I agree, its not nice. > Could seq_record.features have a repr() function to give me something useful > instead of this? > > >>> print seq_record.features > [, instance at 0x837b9cc>, ] Yes we could add that, but you wouldn't want to do that on a typical genome with thousands of features. Adding a repr method for the Reference object is also something I had wondered about doing. > I don't see documented anywhere in the biopython docs access the features, > pasting something like the following into docs would give