From bugzilla-daemon at portal.open-bio.org Sat Dec 1 13:25:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Dec 2007 13:25:56 -0500 Subject: [Biopython-dev] [Bug 2414] New: run_tests, py fails with a single test on a test suite Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2414 Summary: run_tests,py fails with a single test on a test suite Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: trivial Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: tiagoantao at gmail.com When a test python file is composed of a single test, PyUnit dumps the following log: Ran 1 test in xxxs run_test.py on (current CVS HEAD) line 284 is only searching for the plural Run yy tests in xxxs Mini patch (not tested, but trivial) if expected_line[:3] == "Ran" and \ string.find(expected_line, " tests in ") >= 5: becomes, eg, if expected_line[:3] == "Ran" and \ (string.find(expected_line, " tests in ") >= 5 or string.find(expected_line, " test in ") >= 5): I actually have, for now, a single case with one test, as I split my test cases in depending on external binaries and not depending on external binaries (creating a test scenario with a single test to try to run an external application) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Mon Dec 3 16:41:06 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 3 Dec 2007 21:41:06 +0000 Subject: [Biopython-dev] [Bug 2414] New: run_tests, py fails with a single test on a test suite In-Reply-To: References: Message-ID: <6d941f120712031341p1af6ca55oa04b787f8e0937@mail.gmail.com> Hi, Could I please ask you (I suppose Peter or Michiel) to advise on this? I have my code for coalescent simulation ready, but I am not committing because one of my test files has only a single test (to see if it can run the coalescent simulator, all other tests are non-dependent on having the simulator, so are on a different test case). I can either put a dummy test just to have 2 tests (hack around), or run_test can be sorted out. Thanks Tiago PS - Apologies in advance if I take too much time to respond, I will be traveling for the next 3 days. On Dec 1, 2007 6:25 PM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2414 > > Summary: run_tests,py fails with a single test on a test suite > Product: Biopython > Version: Not Applicable > Platform: All > OS/Version: All > Status: NEW > Severity: trivial > Priority: P2 > Component: Main Distribution > AssignedTo: biopython-dev at biopython.org > ReportedBy: tiagoantao at gmail.com > > > When a test python file is composed of a single test, PyUnit dumps the > following log: > Ran 1 test in xxxs > run_test.py on (current CVS HEAD) line 284 is only searching for the plural > Run yy tests in xxxs > Mini patch (not tested, but trivial) > if expected_line[:3] == "Ran" and \ > string.find(expected_line, " tests in ") >= 5: > becomes, eg, > if expected_line[:3] == "Ran" and \ > (string.find(expected_line, " tests in ") >= 5 or > string.find(expected_line, " test in ") >= 5): > > I actually have, for now, a single case with one test, as I split my test cases > in depending on external binaries and not depending on external binaries > (creating a test scenario with a single test to try to run an external > application) > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- http://www.tiago.org/ps From bugzilla-daemon at portal.open-bio.org Mon Dec 3 17:04:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Dec 2007 17:04:05 -0500 Subject: [Biopython-dev] [Bug 2414] run_tests, py fails with a single test on a test suite In-Reply-To: Message-ID: <200712032204.lB3M45tn000935@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2414 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-03 17:04 EST ------- Are you talking about test_PopGen_FDist.py? I don't have fdist installed, so I haven't found this problem yet... In anycase, your fix looks fine, although arguably a regular expession (with an optional "s" in "tests") would be more elegant. I am happy for you to make this change in run_tests.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Dec 4 02:10:40 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 4 Dec 2007 02:10:40 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> Hi everybody, I am still looking at the different code in Biopython to access SwissProt. With Bio.SwissProt, we can access the SwissProt database as follows: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> record = dictionary["O23719"] # record is now a string containing the SwissProt record O23719 Another option is to pull out a Bio.SwissProt.SProt.Record object: >>> from Bio.SwissProt import SProt >>> s_parser = SProt.RecordParser() >>> dictionary = SProt.ExPASyDictionary(parser=s_parser) >>> record = dictionary["O23719"] # record is now a Bio.SwissProt.SProt.Record object containing record O23719 A third option is to pull out a SeqRecord by using SeqIO: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> record = dictionary["O23719"] >>> from Bio import SeqIO >>> import StringIO >>> record = SeqIO.parse(StringIO.StringIO(record), "swiss").next() # record is now a Bio.SeqRecord.SeqRecord object containing record O23719 Compare this to how we would read a Fasta file: >>> from Bio import SeqIO >>> input = open("mydata.fa") >>> record = SeqIO.parse(input, "fasta").next() For consistency with Bio.SeqIO, it would make sense if ExPASyDictionary would returns handles instead of parsed objects. Then these examples look like: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> record = dictionary["O23719"].read() # record is now a string containing the SwissProt record O23719 To pull out a Bio.SwissProt.SProt.Record object: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> handle = dictionary["O23719"] >>> record = SProt.parse(handle) # record is now a Bio.SwissProt.SProt.Record object containing record O23719 To pull out a SeqRecord by using SeqIO: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> handle = dictionary["O23719"] >>> from Bio import SeqIO >>> record = SeqIO.parse(handle, "swiss").next() # record is now a Bio.SeqRecord.SeqRecord object containing record O23719 *If* we decide that ExPASyDictionary should return handles, *then* actually we don't really need an ExPASyDictionary, as its behavior is then largely the same as Bio.WWW.ExPASy.get_sprot_raw. So in short, in my opinion Bio.SwissProt.SProt.ExPASyDictionary does not add much beyond what Bio.WWW.ExPASy.get_sprot_raw already offers. Any comments? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Tue Dec 4 05:26:52 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 4 Dec 2007 10:26:52 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> > For consistency with Bio.SeqIO, it would make sense if ExPASyDictionary would > returns handles instead of parsed objects. I agree that it would in general be simpler if our online APIs returned handles by default. This also applies to the Bio.GenBank methods. Of course, we should preserve existing functionality if possible. Another alternative is to return SeqRecords by default (via Bio.SeqIO) but this wouldn't generalise to non-sequence files like ProSite etc. One idea I had been thinking about was adding a new function Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as a proxy to all our supported online sequence databases, and either return a handle to the requested record(s), or perhaps return SeqRecord(s). One API model would be that outlined for the (possibly defunct?) Open Biological Database Access (OBDA) scheme, which covers both BioSQL access and online fetching (biofetch): http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/biofetch/biofetch.txt?cvsroot=obf-common But first I should probably finish working on BioSQL ;) > *If* we decide that ExPASyDictionary should return handles, *then* actually > we don't really need an ExPASyDictionary, as its behavior is then largely the > same as Bio.WWW.ExPASy.get_sprot_raw. So in short, in my opinion > Bio.SwissProt.SProt.ExPASyDictionary does not add much beyond what > Bio.WWW.ExPASy.get_sprot_raw already offers. Can ExPASyDictionary return anything that get_sprot_raw can't? Otherwise from the user's point of view its just a coding style issue (dictionary versus function). Peter From bugzilla-daemon at portal.open-bio.org Tue Dec 4 05:41:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Dec 2007 05:41:25 -0500 Subject: [Biopython-dev] [Bug 2414] run_tests, py fails with a single test on a test suite In-Reply-To: Message-ID: <200712041041.lB4AfPTN008806@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2414 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from tiagoantao at gmail.com 2007-12-04 05:41 EST ------- > Are you talking about test_PopGen_FDist.py? I don't have fdist installed, so I > haven't found this problem yet... No, it is my new SimCoal code. > In anycase, your fix looks fine, although arguably a regular expession (with an > optional "s" in "tests") would be more elegant. > > I am happy for you to make this change in run_tests.py OK, I will do this with a regex. I cannot promise when though, as I am traveling until Saturday (but it will before next Monday). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 4 14:43:17 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Dec 2007 14:43:17 -0500 Subject: [Biopython-dev] [Bug 2412] NCBIXML. fails parsing with blast 2.2.15 in special cases (Karlin-Altschul) In-Reply-To: Message-ID: <200712041943.lB4JhHkE012059@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2412 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-04 14:43 EST ------- The fact that your example gives an empty XML file is essentially due to some problem with Blast. I agree that the Biopython error message you quoted is very unhelpful in this situation. Are you using Biopython 1.43 (as suggested by the strack trace in the error report), or Biopython 1.44 as reported in the bug details? What does this do on your setup? from StringIO import StringIO from Bio.Blast import NCBIXML handle = StringIO("") for record in NCBIXML.parse(handle) : print record If you are using Biopython 1.44 or later you should get a helpful error message, "ValueError: Your XML file was empty". You can catch this, and inspect the contents of the error handle if you want to deal with this in your application. i.e. I think this bug has already been fixed in Biopython 1.44 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 4 15:25:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Dec 2007 15:25:45 -0500 Subject: [Biopython-dev] [Bug 2396] BioSQL loader does not store sequence level annotations dict In-Reply-To: Message-ID: <200712042025.lB4KPj2D016252@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2396 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-04 15:25 EST ------- I think I have fixed this now in CVS. One related wrinkle is that if you had this: record.annotations["example1"] == "string" record.annotations["example2"] == ["alpha"] record.annotations["example3"] == ["alpha", "beta"] after loading and retreiving from BioSQL you have this: record.annotations["example1"] == ["string"] record.annotations["example2"] == ["alpha"] record.annotations["example3"] == ["alpha", "beta"] i.e. Everything becomes a list of strings. It is difficult to see how to deal with this elegantly given the current BioSQL schema. One option is to treat single entries as either a list or a string depending on the rank field in the database... I should probably take this up with the BioSQL mailing list to see how/if this issue affects BioPerl/BioJava. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Dec 4 20:13:01 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 4 Dec 2007 20:13:01 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> > One idea I had been thinking about was adding a new function > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > a proxy to all our supported online sequence databases, and either > return a handle to the requested record(s), or perhaps return > SeqRecord(s). I believe that Bio.db has such a functionality, but I don't think it is used much. Anyway, we currently have too many functions in Biopython to access databases rather than too few. So I think we should not add any new ones. > > *If* we decide that ExPASyDictionary should return handles, *then* actually > > we don't really need an ExPASyDictionary, as its behavior is then largely the > > same as Bio.WWW.ExPASy.get_sprot_raw. So in short, in my opinion > > Bio.SwissProt.SProt.ExPASyDictionary does not add much beyond what > > Bio.WWW.ExPASy.get_sprot_raw already offers. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > Otherwise from the user's point of view its just a coding style issue > (dictionary versus function). ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can return any record that ExPASyDictionary can return. There are two differences between the two: 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As you write, this is just a coding style issue. 2) When creating a ExPASyDictionary, users can pass a parser to parse the records before returning them. This is in essence only a coding style issue. In particular, do we want: >>> from Bio.SwissProt import SProt >>> sprot_parser = SProt.RecordParser() >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) >>> record = dictionary["O12345"] or >>> from Bio.SwissProt import SProt >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SProt.parse(handle) For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. For plain-text output, in the ExPASyDictionary approach we pass no parser, and in the get_sprot_raw approach we call read() on the handle directly. To get a handle, in the ExPASyDictionary approach we can use StringIO to convert the text output to a handle; in the get_sprot_raw approach we don't need to do anything. In my opinion, both 1) and 2) are just coding style issues. Maintaining both ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes confusion for users. So I suggest we focus on one of these, and deprecate the other. The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is organized, and therefore has my preference. Two more issues: 1) I am not sure why the SwissProt code is kept in a separate SProt submodule of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can save ourselves some typing by keeping all the SwissProt code there instead of in SProt.py. 2) A SwissProt.parse function currently doesn't exist. Right now it is a three-step process: >>> s_parser = SProt.RecordParser() >>> s_iterator = SProt.Iterator(handle, s_parser) >>> record = s_iterator.next() A SwissProt.parse function would just contain these three steps, or perhaps only the first two. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Tue Dec 4 20:13:01 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 4 Dec 2007 20:13:01 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> > One idea I had been thinking about was adding a new function > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > a proxy to all our supported online sequence databases, and either > return a handle to the requested record(s), or perhaps return > SeqRecord(s). I believe that Bio.db has such a functionality, but I don't think it is used much. Anyway, we currently have too many functions in Biopython to access databases rather than too few. So I think we should not add any new ones. > > *If* we decide that ExPASyDictionary should return handles, *then* actually > > we don't really need an ExPASyDictionary, as its behavior is then largely the > > same as Bio.WWW.ExPASy.get_sprot_raw. So in short, in my opinion > > Bio.SwissProt.SProt.ExPASyDictionary does not add much beyond what > > Bio.WWW.ExPASy.get_sprot_raw already offers. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > Otherwise from the user's point of view its just a coding style issue > (dictionary versus function). ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can return any record that ExPASyDictionary can return. There are two differences between the two: 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As you write, this is just a coding style issue. 2) When creating a ExPASyDictionary, users can pass a parser to parse the records before returning them. This is in essence only a coding style issue. In particular, do we want: >>> from Bio.SwissProt import SProt >>> sprot_parser = SProt.RecordParser() >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) >>> record = dictionary["O12345"] or >>> from Bio.SwissProt import SProt >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SProt.parse(handle) For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. For plain-text output, in the ExPASyDictionary approach we pass no parser, and in the get_sprot_raw approach we call read() on the handle directly. To get a handle, in the ExPASyDictionary approach we can use StringIO to convert the text output to a handle; in the get_sprot_raw approach we don't need to do anything. In my opinion, both 1) and 2) are just coding style issues. Maintaining both ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes confusion for users. So I suggest we focus on one of these, and deprecate the other. The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is organized, and therefore has my preference. Two more issues: 1) I am not sure why the SwissProt code is kept in a separate SProt submodule of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can save ourselves some typing by keeping all the SwissProt code there instead of in SProt.py. 2) A SwissProt.parse function currently doesn't exist. Right now it is a three-step process: >>> s_parser = SProt.RecordParser() >>> s_iterator = SProt.Iterator(handle, s_parser) >>> record = s_iterator.next() A SwissProt.parse function would just contain these three steps, or perhaps only the first two. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 4451 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071204/9bc0ae4d/attachment.bin From biopython-dev at maubp.freeserve.co.uk Wed Dec 5 05:03:34 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 5 Dec 2007 10:03:34 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00712050203p17aa38b0q15d2edd65542021d@mail.gmail.com> On 12/5/07, Michiel De Hoon wrote: > > One idea I had been thinking about was adding a new function > > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > > a proxy to all our supported online sequence databases, and either > > return a handle to the requested record(s), or perhaps return > > SeqRecord(s). > > I believe that Bio.db has such a functionality, but I don't think it is used > much. Anyway, we currently have too many functions in Biopython to > access databases rather than too few. So I think we should not add any > new ones. Certainly before taking my suggestion seriously we should try and take stock of where we stand at the moment with respect to online databases. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > > Otherwise from the user's point of view its just a coding style issue > > (dictionary versus function). > > ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can > return any record that ExPASyDictionary can return. > There are two differences between the two: > 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As > you write, this is just a coding style issue. > 2) When creating a ExPASyDictionary, users can pass a parser to parse the > records before returning them. This is in essence only a coding style issue. > In particular, do we want: > >>> from Bio.SwissProt import SProt > >>> sprot_parser = SProt.RecordParser() > >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) > >>> record = dictionary["O12345"] > or > >>> from Bio.SwissProt import SProt > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SProt.parse(handle) Or do we want to encourage Bio.SeqIO (which happens to call Bio.SwissProt.SProt internally)? >>> from Bio SeqIO >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SeqIO.parse(handle, "swiss") This is the style I prefer (and is very similar to the related examples I added to the tutorial). It separates fetching the data (as a handle) and parsing it (via SeqIO). > For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, > in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. > For plain-text output, in the ExPASyDictionary approach we pass no parser, > and in the get_sprot_raw approach we call read() on the handle directly. > To get a handle, in the ExPASyDictionary approach we can use StringIO to > convert the text output to a handle; in the get_sprot_raw approach we don't > need to do anything. > > In my opinion, both 1) and 2) are just coding style issues. Maintaining both > ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes > confusion for users. So I suggest we focus on one of these, and deprecate the > other. As ExPASyDictionary just calls wraps get_sprot_raw with a parser object, the additional overhead is minimal. The dictionary metaphore is quite nice - even if you don't actually gain much functionality. However, setting up the dictionary as it is now (requiring an "old fashioned" parser object) is fairly fiddly/confusing. > The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is > organized, and therefore has my preference. I would agree if you wanted to depreceate one, I would keep get_sprot_raw and drop ExPASyDictionary. However we should try and have a coherent API for the other online tools as well. > Two more issues: > 1) I am not sure why the SwissProt code is kept in a separate SProt submodule > of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can > save ourselves some typing by keeping all the SwissProt code there instead of > in SProt.py. Or just encourage using it via Bio.SeqIO (then we can moving things later if wanted) > 2) A SwissProt.parse function currently doesn't exist. Right now it is a > three-step process: > >>> s_parser = SProt.RecordParser() > >>> s_iterator = SProt.Iterator(handle, s_parser) > >>> record = s_iterator.next() > A SwissProt.parse function would just contain these three steps, or > perhaps only the first two. The Bio.SeqIO.parse() is very close though. Peter From biopython-dev at maubp.freeserve.co.uk Wed Dec 5 05:03:34 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 5 Dec 2007 10:03:34 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00712050203p17aa38b0q15d2edd65542021d@mail.gmail.com> On 12/5/07, Michiel De Hoon wrote: > > One idea I had been thinking about was adding a new function > > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > > a proxy to all our supported online sequence databases, and either > > return a handle to the requested record(s), or perhaps return > > SeqRecord(s). > > I believe that Bio.db has such a functionality, but I don't think it is used > much. Anyway, we currently have too many functions in Biopython to > access databases rather than too few. So I think we should not add any > new ones. Certainly before taking my suggestion seriously we should try and take stock of where we stand at the moment with respect to online databases. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > > Otherwise from the user's point of view its just a coding style issue > > (dictionary versus function). > > ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can > return any record that ExPASyDictionary can return. > There are two differences between the two: > 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As > you write, this is just a coding style issue. > 2) When creating a ExPASyDictionary, users can pass a parser to parse the > records before returning them. This is in essence only a coding style issue. > In particular, do we want: > >>> from Bio.SwissProt import SProt > >>> sprot_parser = SProt.RecordParser() > >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) > >>> record = dictionary["O12345"] > or > >>> from Bio.SwissProt import SProt > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SProt.parse(handle) Or do we want to encourage Bio.SeqIO (which happens to call Bio.SwissProt.SProt internally)? >>> from Bio SeqIO >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SeqIO.parse(handle, "swiss") This is the style I prefer (and is very similar to the related examples I added to the tutorial). It separates fetching the data (as a handle) and parsing it (via SeqIO). > For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, > in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. > For plain-text output, in the ExPASyDictionary approach we pass no parser, > and in the get_sprot_raw approach we call read() on the handle directly. > To get a handle, in the ExPASyDictionary approach we can use StringIO to > convert the text output to a handle; in the get_sprot_raw approach we don't > need to do anything. > > In my opinion, both 1) and 2) are just coding style issues. Maintaining both > ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes > confusion for users. So I suggest we focus on one of these, and deprecate the > other. As ExPASyDictionary just calls wraps get_sprot_raw with a parser object, the additional overhead is minimal. The dictionary metaphore is quite nice - even if you don't actually gain much functionality. However, setting up the dictionary as it is now (requiring an "old fashioned" parser object) is fairly fiddly/confusing. > The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is > organized, and therefore has my preference. I would agree if you wanted to depreceate one, I would keep get_sprot_raw and drop ExPASyDictionary. However we should try and have a coherent API for the other online tools as well. > Two more issues: > 1) I am not sure why the SwissProt code is kept in a separate SProt submodule > of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can > save ourselves some typing by keeping all the SwissProt code there instead of > in SProt.py. Or just encourage using it via Bio.SeqIO (then we can moving things later if wanted) > 2) A SwissProt.parse function currently doesn't exist. Right now it is a > three-step process: > >>> s_parser = SProt.RecordParser() > >>> s_iterator = SProt.Iterator(handle, s_parser) > >>> record = s_iterator.next() > A SwissProt.parse function would just contain these three steps, or > perhaps only the first two. The Bio.SeqIO.parse() is very close though. Peter From mdehoon at c2b2.columbia.edu Wed Dec 5 05:29:38 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 5 Dec 2007 05:29:38 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu><320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com><6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <320fb6e00712050203p17aa38b0q15d2edd65542021d@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B671@mail2.exch.c2b2.columbia.edu> > Or do we want to encourage Bio.SeqIO (which happens to call > Bio.SwissProt.SProt internally)? > > >>> from Bio SeqIO > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SeqIO.parse(handle, "swiss") > > This is the style I prefer (and is very similar to the related > examples I added to the tutorial). It separates fetching the data (as > a handle) and parsing it (via SeqIO). SeqIO.parse returns a SeqRecord; a SwissProt.parse returns a SwissProt.SProt.Record. Does the SeqRecord contain the same information as a SwissProt.SProt.Record? Or is some information lost? If they contain the same information, then I am in favor of encouraging Bio.SeqIO. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Wed Dec 5 05:29:38 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 5 Dec 2007 05:29:38 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu><320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com><6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <320fb6e00712050203p17aa38b0q15d2edd65542021d@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B671@mail2.exch.c2b2.columbia.edu> > Or do we want to encourage Bio.SeqIO (which happens to call > Bio.SwissProt.SProt internally)? > > >>> from Bio SeqIO > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SeqIO.parse(handle, "swiss") > > This is the style I prefer (and is very similar to the related > examples I added to the tutorial). It separates fetching the data (as > a handle) and parsing it (via SeqIO). SeqIO.parse returns a SeqRecord; a SwissProt.parse returns a SwissProt.SProt.Record. Does the SeqRecord contain the same information as a SwissProt.SProt.Record? Or is some information lost? If they contain the same information, then I am in favor of encouraging Bio.SeqIO. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Wed Dec 5 06:55:45 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 05 Dec 2007 11:55:45 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> Message-ID: <475691C1.3020705@maubp.freeserve.co.uk> On 12/5/07, Michiel De Hoon wrote: > > One idea I had been thinking about was adding a new function > > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > > a proxy to all our supported online sequence databases, and either > > return a handle to the requested record(s), or perhaps return > > SeqRecord(s). > > I believe that Bio.db has such a functionality, but I don't think it is used > much. Anyway, we currently have too many functions in Biopython to > access databases rather than too few. So I think we should not add any > new ones. Certainly before taking my suggestion seriously we should try and take stock of where we stand at the moment with respect to online databases. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > > Otherwise from the user's point of view its just a coding style issue > > (dictionary versus function). > > ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can > return any record that ExPASyDictionary can return. > There are two differences between the two: > 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As > you write, this is just a coding style issue. > 2) When creating a ExPASyDictionary, users can pass a parser to parse the > records before returning them. This is in essence only a coding style issue. > In particular, do we want: > >>> from Bio.SwissProt import SProt > >>> sprot_parser = SProt.RecordParser() > >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) > >>> record = dictionary["O12345"] > or > >>> from Bio.SwissProt import SProt > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SProt.parse(handle) Or do we want to encourage Bio.SeqIO (which happens to call Bio.SwissProt.SProt internally)? >>> from Bio SeqIO >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SeqIO.parse(handle, "swiss") This is the style I prefer (and is very similar to the related examples I added to the tutorial). It separates fetching the data (as a handle) and parsing it (via SeqIO). > For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, > in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. > For plain-text output, in the ExPASyDictionary approach we pass no parser, > and in the get_sprot_raw approach we call read() on the handle directly. > To get a handle, in the ExPASyDictionary approach we can use StringIO to > convert the text output to a handle; in the get_sprot_raw approach we don't > need to do anything. > > In my opinion, both 1) and 2) are just coding style issues. Maintaining both > ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes > confusion for users. So I suggest we focus on one of these, and deprecate the > other. As ExPASyDictionary just calls wraps get_sprot_raw with a parser object, the additional overhead is minimal. The dictionary metaphore is quite nice - even if you don't actually gain much functionality. However, setting up the dictionary as it is now (requiring an "old fashioned" parser object) is fairly fiddly/confusing. > The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is > organized, and therefore has my preference. I would agree if you wanted to depreceate one, I would keep get_sprot_raw and drop ExPASyDictionary. However we should try and have a coherent API for the other online tools as well. > Two more issues: > 1) I am not sure why the SwissProt code is kept in a separate SProt submodule > of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can > save ourselves some typing by keeping all the SwissProt code there instead of > in SProt.py. Or just encourage using it via Bio.SeqIO (then we can moving things later if wanted) > 2) A SwissProt.parse function currently doesn't exist. Right now it is a > three-step process: > >>> s_parser = SProt.RecordParser() > >>> s_iterator = SProt.Iterator(handle, s_parser) > >>> record = s_iterator.next() > A SwissProt.parse function would just contain these three steps, or > perhaps only the first two. The Bio.SeqIO.parse() is very close though. Peter From mdehoon at c2b2.columbia.edu Fri Dec 7 05:11:33 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 7 Dec 2007 05:11:33 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt /Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <475691C1.3020705@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B673@mail2.exch.c2b2.columbia.edu> Hi everybody, To summarize, I rewrote the chapter on SwissProt/Prosite/Prodoc/ExPASy and put it here: http://biopython.org/DIST/docs/tutorial/Tutorial-proposal.html#htoc51 (chapter 6 in the tutorial) This is merely a proposal on how this should work; none of this is in CVS yet. Please let us know if you have any objections. If there are no objections, I can upload the new code to CVS. That would conclude my work on Bio.WWW.ExPASy; the final (and biggest) part of my work on Bio.WWW will be to look at the various Biopython modules to interact with NCBI (Genbank, EUtils). Two comments: 1) In this proposal, I am using SwissProt.parse instead of SeqIO.parse since the latter does not (yet) store all information contained in a SwissProt file. I'd be happy though to move to SeqIO.parse for SwissProt also once it does. 2) It may be nice to have a SwissProt.read and SeqIO.read to read and return exactly one record from the handle, in addition to parse() to create an iterator to read multiple records. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3662 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython-dev/attachments/20071207/0442ab19/attachment.bin From biopython at maubp.freeserve.co.uk Fri Dec 7 05:46:32 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Dec 2007 10:46:32 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt /Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B673@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <475691C1.3020705@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B673@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00712070246g53e8096ew156f4502791bce9b@mail.gmail.com> > To summarize, I rewrote the chapter on SwissProt/Prosite/Prodoc/ExPASy and > put it here: > > http://biopython.org/DIST/docs/tutorial/Tutorial-proposal.html#htoc51 > (chapter 6 in the tutorial) > > This is merely a proposal on how this should work; none of this is in CVS > yet. Please let us know if you have any objections. I would add a note saying doing it this way gives Bio.SwissProt.SProt.Record objects, while you could alternatively get SeqRecord objects as described in the SeqIO chapter (use a reference). > If there are no objections, I can upload the new code to CVS. That would > conclude my work on Bio.WWW.ExPASy; the final (and biggest) part of my work > on Bio.WWW will be to look at the various Biopython modules to interact with > NCBI (Genbank, EUtils). That will be "fun"! > Two comments: > 1) In this proposal, I am using SwissProt.parse instead of SeqIO.parse since > the latter does not (yet) store all information contained in a SwissProt > file. I'd be happy though to move to SeqIO.parse for SwissProt also once it > does. > 2) It may be nice to have a SwissProt.read and SeqIO.read to read and return > exactly one record from the handle, in addition to parse() to create an > iterator to read multiple records. I'd suggested a Bio.SeqIO function, with a name like parse1() or parse_sole() etc which would return a single SeqRecord - and raise an error if the handle didn't contain one and only one record. We could call this function read() if you prefer. Peter From mdehoon at c2b2.columbia.edu Fri Dec 7 22:18:09 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 08 Dec 2007 12:18:09 +0900 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt /Bio.SeqIO In-Reply-To: <320fb6e00712070246g53e8096ew156f4502791bce9b@mail.gmail.com> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <475691C1.3020705@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B673@mail2.exch.c2b2.columbia.edu> <320fb6e00712070246g53e8096ew156f4502791bce9b@mail.gmail.com> Message-ID: <475A0CF1.1080802@c2b2.columbia.edu> Peter wrote: > I would add a note saying doing it this way gives > Bio.SwissProt.SProt.Record objects, > while you could alternatively get SeqRecord objects as described in > the SeqIO chapter > (use a reference). OK I will add that. > > I'd suggested a Bio.SeqIO function, with a name like parse1() or > parse_sole() etc which > would return a single SeqRecord - and raise an error if the handle > didn't contain one > and only one record. We could call this function read() if you prefer. > I'd prefer read() instead of parse1(), parse_sole() etc. for the following reasons: 1) Having two names that are clearly different emphasizes the fact that they return different things (parse() returns an iterator, read() a record). 2) Some modules deal with data that always consist of one record (for example, gene expression data in case of Bio.Cluster). Such modules can have a read() function but not a parse(). It would feel strange if a module has a parse1() function but not a parse(). --Michiel. From bugzilla-daemon at portal.open-bio.org Sat Dec 8 08:09:00 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 8 Dec 2007 08:09:00 -0500 Subject: [Biopython-dev] [Bug 2417] New: Bio.SeqIO single SeqRecord read/parse function Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2417 Summary: Bio.SeqIO single SeqRecord read/parse function Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Most sequence file format can contain a single record, and in this situation having to use an iterator returned by Bio.SeqIO.parse() can be clumsy. For example, dealing with GenBank files for bacterial genomes or chromosomes. Or, from the tutorial as of Biopython 1.44, from Bio.WWW import ExPASy from Bio import SeqIO seq_record = SeqIO.parse(ExPASy.get_sprot_raw("O23729"), "swiss").next() print seq_record.id print seq_record.seq print len(seq_record.seq) Using the iterator.next() method as above works fine, it will however silently ignore any unexpected subsequent records if present. Checking your file only has one record would require a an additional check to confirm a second .next() call fails, or another such workaround. I am proposing a new function for use with a handle containing one and only one record. This would raise an error if the handle contained no records, or if it contained more than one record. It would be defined in Bio/SeqIO/__init__.py as a simple wrapper for Bio.SeqIO.parse() Note - My proposed "read single record" function would NOT work for cases where the handle contains multiple records and you only want the first one (because I would raise an exception). I would regard this as a corner case, and catering to this risks silently ignoring unexpected second and subsequent records in other use cases. In such situations using Bio.SeqIO.parse(...).next() is advised. I had previously suggested "parse1", "parse_sole", "parse_only" - none of which are very appealing. On the dev mailing list today, Michiel has proposed "read": Michiel de Hoon wrote: > > Peter wrote: > > I'd suggested a Bio.SeqIO function, with a name like parse1() or > > parse_sole() etc which would return a single SeqRecord - and raise > > an error if the handle didn't contain one and only one record. We > > could call this function read() if you prefer. > > > I'd prefer read() instead of parse1(), parse_sole() etc. for the > following reasons: > > 1) Having two names that are clearly different emphasizes the fact that > they return different things (parse() returns an iterator, read() a record). > > 2) Some modules deal with data that always consist of one record (for > example, gene expression data in case of Bio.Cluster). Such modules can > have a read() function but not a parse(). It would feel strange if a > module has a parse1() function but not a parse(). I plan to add this functionality to Bio/SeqIO/__init__.py as a "read" function, and update the tutorial accordingly shortly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Sat Dec 8 08:10:33 2007 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 8 Dec 2007 13:10:33 +0000 Subject: [Biopython-dev] Bio.SeqIO function to read a single record Message-ID: <320fb6e00712080510k3d4e5148gb0ec332a0d745452@mail.gmail.com> Michiel de Hoon wrote: > > > > I'd suggested a Bio.SeqIO function, with a name like parse1() or > > parse_sole() etc which would return a single SeqRecord - and raise > > an error if the handle didn't contain one and only one record. We > > could call this function read() if you prefer. > > > I'd prefer read() instead of parse1(), parse_sole() etc. for the > following reasons: > > 1) Having two names that are clearly different emphasizes the fact that > they return different things (parse() returns an iterator, read() a record). > > 2) Some modules deal with data that always consist of one record (for > example, gene expression data in case of Bio.Cluster). Such modules can > have a read() function but not a parse(). It would feel strange if a > module has a parse1() function but not a parse(). OK. I've filed an enhancement bug, which I'll mention on the main mailing list, http://bugzilla.open-bio.org/show_bug.cgi?id=2417 Unless there is some negative feedback, I'll add that functionality shortly. Peter From bugzilla-daemon at portal.open-bio.org Sun Dec 9 11:24:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Dec 2007 11:24:19 -0500 Subject: [Biopython-dev] [Bug 2417] Bio.SeqIO single SeqRecord read/parse function In-Reply-To: Message-ID: <200712091624.lB9GOJCe025680@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2417 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-09 11:24 EST ------- Updated Bio/SeqIO/__init__.py to have include new "read" function in CVS revision 1.21 I'll do the documentation and unit tests next, before marking this as fixed. [Its not yet too late to change the name from "read" if anyone can come up with a nice clear alternative, or a strong argument against this choice] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 9 13:50:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Dec 2007 13:50:06 -0500 Subject: [Biopython-dev] [Bug 2417] Bio.SeqIO single SeqRecord read/parse function In-Reply-To: Message-ID: <200712091850.lB9Io6tj013469@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2417 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-09 13:50 EST ------- I've updated the tutorial, wiki and unit test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 9 14:03:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Dec 2007 14:03:28 -0500 Subject: [Biopython-dev] [Bug 2412] NCBIXML. fails parsing with blast 2.2.15 in special cases (Karlin-Altschul) In-Reply-To: Message-ID: <200712091903.lB9J3SkM014338@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2412 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WORKSFORME ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-09 14:03 EST ------- As per my comment 4, I think that in Biopython 1.44 we look for the special case of an empty XML output file and raise a ValueError. On Biopython 1.43 the error was very unhelpful. I'm marking this as "works for me". Bjoern, please reopen this bug if there is still a problem using Biopython 1.44 Thanks, Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 9 20:18:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Dec 2007 20:18:50 -0500 Subject: [Biopython-dev] [Bug 2418] New: SyntaxError should be ValueError Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2418 Summary: SyntaxError should be ValueError Product: Biopython Version: 1.44 Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp Biopython now has SyntaxErrors all over the place. Most if not all of these should be ValueErrors. SyntaxErrors are appropriate if there is a syntax problem in the code itself, not (as it's used in Biopython) if there is a syntax problem in an input data file. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 10 05:01:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Dec 2007 05:01:49 -0500 Subject: [Biopython-dev] [Bug 2418] SyntaxError should be ValueError In-Reply-To: Message-ID: <200712101001.lBAA1nxL011529@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2418 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-10 05:01 EST ------- That would be my fault. Should we introduce a Biopython "FormatSyntaxError" exception (as a subclass of ValueError defined in Bio/__init__.py), or just switch these to ValueError exceptions instead? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 10 07:13:16 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Dec 2007 07:13:16 -0500 Subject: [Biopython-dev] [Bug 2418] SyntaxError should be ValueError In-Reply-To: Message-ID: <200712101213.lBACDGLG022397@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2418 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-12-10 07:13 EST ------- > Should we introduce a Biopython "FormatSyntaxError" exception (as a subclass of > ValueError defined in Bio/__init__.py), or just switch these to ValueError > exceptions instead? I would stick to ValueError. The error message should be clear enough for the user to understand what the problem is. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 11 06:44:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Dec 2007 06:44:33 -0500 Subject: [Biopython-dev] [Bug 2418] SyntaxError should be ValueError In-Reply-To: Message-ID: <200712111144.lBBBiXrZ014612@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2418 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-11 06:44 EST ------- I've just fixed the Bio.SeqIO, Bio.GenBank, Bio.SwissProt and Bio.SCOP cases and their test cases. I see you've found and fixed a whole more - its clearly not just me that used the SyntaxError exception in this way. We should probably also change Bio.Medline, Bio.Prosite and Bio.Blast I think the cases in Bio.config are a little different... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 11 21:54:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Dec 2007 21:54:47 -0500 Subject: [Biopython-dev] [Bug 2418] SyntaxError should be ValueError In-Reply-To: Message-ID: <200712120254.lBC2slIL022573@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2418 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-12-11 21:54 EST ------- I have replaced the SyntaxErrors by ValueErrors where appropriate. The remaining SyntaxErrors, as far as I can tell, are being used correctly. Closing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 10:07:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 10:07:12 -0500 Subject: [Biopython-dev] [Bug 2419] New: SeqUtils __init__.py missing complement function (v1.43 and v1.44) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2419 Summary: SeqUtils __init__.py missing complement function (v1.43 and v1.44) Product: Biopython Version: 1.44 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: justin.t.riley at gmail.com This issue exists in both 1.43 and 1.44. You won't notice this bug on an import of SeqUtils. However, when you try to use the six_frame_translations function like so: from Bio import SeqUtils SeqUtils.six_frame_translations('GTCA....AAT') you get: : global name 'complement' is not defined at line 285 (for version 1.43 anyhow) At first I searched all the Biopython modules for a "def complement" string and found one in Seq but it was for the complement of an actual Seq object. Looking around the web I found: def complement(seq): " returns the complementary sequence (NOT antiparallel) " return ''.join([IUPACData.ambiguous_dna_complement[x] for x in seq]) Pasting the above in Bio/SeqUtils/__init__.py solved the issue for me. Thanks. ~jtriley -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 15:33:43 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 15:33:43 -0500 Subject: [Biopython-dev] [Bug 2417] Bio.SeqIO single SeqRecord read/parse function In-Reply-To: Message-ID: <200712122033.lBCKXhxd020792@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2417 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 16:48:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 16:48:03 -0500 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200712122148.lBCLm3iH025664@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #19 from Biosql at hotmail.com 2007-12-12 16:48 EST ------- Hi Peter, I know it's been a very long time (more than a month), but I had this huge exam to prepare. Anyway, I've tried the latest version and everything is working fine. Many many thanks to you ! Since any Swiss Prot cross-references ain't uploaded in the Biosql DB, I've tried to parse the flat file with the RecordParser method from SProt instead of the SequenceParser or the SeqIO Parser, but I'm getting an error. I've seen in the bug list that you seem to work on this issue. Am I right ? If not, is there a way to upload the Swiss Prot cross-references ? Again, thank you ! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 17:01:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 17:01:47 -0500 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200712122201.lBCM1lGR026457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-12 17:01 EST ------- Hi Jonathan, I'm glad we've fixed the error for you. Could you be a little more precise about what isn't working with getting Swiss Prot cross-references into BioSQL? e.g. Pick a specific SwissProt record, and quote the lines from the file containing the cross-references. That should be enough for me to try and track down what's going on. By the way - if you want to work with BioSQL, you have to use SeqRecord objects (e.g. from the Bio.SeqIO parser), and not the Bio.SwissProt.SProt.Record objects. This probably explains the error you mentioned using the RecordParser parser instead. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 17:17:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 17:17:36 -0500 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200712122217.lBCMHaBK027220@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #21 from Biosql at hotmail.com 2007-12-12 17:17 EST ------- (In reply to comment #20) > Hi Jonathan, > > I'm glad we've fixed the error for you. Could you be a little more precise > about what isn't working with getting Swiss Prot cross-references into BioSQL? > > e.g. Pick a specific SwissProt record, and quote the lines from the file > containing the cross-references. > > That should be enough for me to try and track down what's going on. > > By the way - if you want to work with BioSQL, you have to use SeqRecord objects > (e.g. from the Bio.SeqIO parser), and not the Bio.SwissProt.SProt.Record > objects. This probably explains the error you mentioned using the RecordParser > parser instead. > > Peter > Sorry for the lack of informations, Here's an example : http://ca.expasy.org/uniprot/Q9CQD1.txt All the sequences, ID line, AC lines and comments (cc lines) are being uploaded in the database, but not the : DR lines (which I consider the most interesting cross-references), the Pubmed references (R_ lines) and the Taxon of the protein. I don't think that the FT lines can be uploaded too isn't ? If so, it would be awesome ! Just to clear things, this uploading pattern is not only related to this protein (Rab5a) but for all the Swiss Prot proteins. Do you need anything else ? Jonathan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 19:42:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 19:42:28 -0500 Subject: [Biopython-dev] [Bug 2419] SeqUtils __init__.py missing complement function (v1.43 and v1.44) In-Reply-To: Message-ID: <200712130042.lBD0gSdm001952@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2419 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-12-12 19:42 EST ------- The "complement" and similar functions were removed from Bio.SeqUtils in Biopython 1.43 because similar functionality existed in several places in Biopython. Apparently, we missed this call to complement in the six_frame_translations function. I would like to avoid adding this function back to SeqUtils. Instead, we can use the reverse_complement function in Bio.Seq, and take its reverse. Could you double-check if the revised version of Bio.SeqUtils.__init__.py works for you? You can pick it up from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/SeqUtils/__init__.py?rev=1.14&cvsroot=biopython&content-type=text/plain -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 13 11:09:27 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Dec 2007 11:09:27 -0500 Subject: [Biopython-dev] [Bug 2419] SeqUtils __init__.py missing complement function (v1.43 and v1.44) In-Reply-To: Message-ID: <200712131609.lBDG9R7u027690@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2419 ------- Comment #2 from justin.t.riley at gmail.com 2007-12-13 11:09 EST ------- (In reply to comment #1) > The "complement" and similar functions were removed from Bio.SeqUtils in > Biopython 1.43 because similar functionality existed in several places in > Biopython. Apparently, we missed this call to complement in the > six_frame_translations function. I would like to avoid adding this function > back to SeqUtils. Instead, we can use the reverse_complement function in > Bio.Seq, and take its reverse. > > Could you double-check if the revised version of Bio.SeqUtils.__init__.py works > for you? You can pick it up from here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/SeqUtils/__init__.py?rev=1.14&cvsroot=biopython&content-type=text/plain > Michiel, I figured the "solution" I mentioned wasn't the ideal but hey it worked :D The revised __init__.py you linked to works great for me. Thanks for getting back to me so quickly with a proper fix. I'm thinking of submitting a patch to Gentoo Linux for this in their Biopython ebuild until your next release. Thanks again! ~Justin -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 13 19:01:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Dec 2007 19:01:54 -0500 Subject: [Biopython-dev] [Bug 2419] SeqUtils __init__.py missing complement function (v1.43 and v1.44) In-Reply-To: Message-ID: <200712140001.lBE01sIR023423@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2419 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2007-12-13 19:01 EST ------- OK, thanks. Closing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 10:17:21 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 10:17:21 -0500 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200712141517.lBEFHLcj018666@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 10:17 EST ------- Thanks for the details. Those fields are not being recorded in the SeqRecord object, so there is no way for BioSQL to put them into the database. This is bug 2235, which is on my mental to do list. Additionally, even if the parser did record the Taxon in the SeqRecord, BioSQL currently don't record this in the database. That seems to have been a short term fix for Bug 1921 which we should probably revisit. Note I'm re-marking THIS bug as fixed. Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 12:56:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 12:56:11 -0500 Subject: [Biopython-dev] [Bug 2421] New: BioSQL should store and retrieve a SeqRecord's dbxrefs Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2421 Summary: BioSQL should store and retrieve a SeqRecord's dbxrefs Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Looking over the code, BioSQL doesn't seem to even try and store database cross references in a SeqRecord's dbxrefs list. It will however store other cross references, e.g. in references and in features. See also: Bug 2390 comment 21 - Error importing Swiss Prot in BioSQL It was pointed out that SwissProt DR lines don't get into the database. The first problem was they didn't even make it to the SeqRecord... Bug 2235 - SeqRecord from Bio.SwissProt.SProt lacks annotation information The latest parser in CVS will now load DR lines into the dbxrefs list. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 13:08:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:08:01 -0500 Subject: [Biopython-dev] [Bug 2422] New: BioSQL shouldn't just ignore the taxon_id Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2422 Summary: BioSQL shouldn't just ignore the taxon_id Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk In Bug 1921 biopython/BioSQL/Loader.py was changed to ignore the taxon_id, in order to avoid a foreign key constraint when the taxon id was not already defined (e.g. from loading an up to date NCBI taxonomy). We should see how BioPerl and BioJava handle this situation... One crude option (which would still be an improvement on the current situation) is to check if the taxon_id is defined, and if it is, then store the record with this included, and if not, issue a warning and store the sequence but omitting the taxon id. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 13:09:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:09:33 -0500 Subject: [Biopython-dev] [Bug 1921] BioSeqDatabase.load() method fails In-Reply-To: Message-ID: <200712141809.lBEI9Xl9001415@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1921 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 13:09 EST ------- In resolving this issue (bug 1921), Biopython's BioSQL is simply ignoring the taxon_id, so it is never recorded in the database. I've just filed a new bug on this: Bug 2422 - BioSQL shouldn't just ignore the taxon_id -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 13:21:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:21:40 -0500 Subject: [Biopython-dev] [Bug 2422] BioSQL shouldn't just ignore the taxon_id In-Reply-To: Message-ID: <200712141821.lBEILelL002298@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2422 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 13:21 EST ------- Some of Marc Colosimo's changes proposed on Bug 1816 may be relevant here, in particular his patch "Various fixes and possible improvements" (attachment 594). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 13:34:42 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:34:42 -0500 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200712141834.lBEIYgsN004015@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 13:34 EST ------- I'd like to close this bug as the original problem seems to be fixed: Using CVS, I can load and retrieve AY243312 into BioSQL using the GenBank file downloaded from here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=29692106 Regarding the taxon id, I've filed a separate bug: Bug 2422 - BioSQL shouldn't just ignore the taxon_id One of Marc's changes in the patch was caching term and ontology id's. Does this make a big difference? If so, could you file a new bug just for that enhancement and rescue those specific changes from the old patch. Similarly for the last_id method - could you file a new bug explaining what problem its solving. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 13:36:34 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:36:34 -0500 Subject: [Biopython-dev] [Bug 2414] run_tests.py fails with a single test on a test suite In-Reply-To: Message-ID: <200712141836.lBEIaYKo004243@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2414 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED Summary|run_tests,py fails with a |run_tests.py fails with a |single test on a test suite |single test on a test suite ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 13:36 EST ------- Tiago made this change in biopython/Tests/run_tests.py revision 1.12, marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 17:40:39 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 17:40:39 -0500 Subject: [Biopython-dev] [Bug 2421] BioSQL should store and retrieve a SeqRecord's dbxrefs In-Reply-To: Message-ID: <200712142240.lBEMedjA021336@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2421 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 17:40 EST ------- This seems to be working in CVS now... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 18:08:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 18:08:55 -0500 Subject: [Biopython-dev] [Bug 2410] DBSeq & DBSeqRecord should subclass Seq & SeqRecord In-Reply-To: Message-ID: <200712142308.lBEN8tWc023431@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2410 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 18:08 EST ------- Fixed in biopython/BioSQL/BioSeq.py revision 1.20 The BioSQL unit tests still pass. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 18:37:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 18:37:55 -0500 Subject: [Biopython-dev] [Bug 2421] BioSQL should store and retrieve a SeqRecord's dbxrefs In-Reply-To: Message-ID: <200712142337.lBENbtiR025242@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2421 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 18:37 EST ------- Fixed in CVS, and test_BioSQL_SeqIO.py updated to verify this explicitly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 15 08:47:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Dec 2007 08:47:48 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200712151347.lBFDlmh9019619@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #795 is|0 |1 obsolete| | ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-15 08:47 EST ------- Created an attachment (id=836) --> (http://bugzilla.open-bio.org/attachment.cgi?id=836&action=view) Patch to Bio/Seq.py [Note this does not update the test suite or the documentation, which would be needed if this is committed] Adds new methods to the MutableSeq object: - transcribe (in place) - back_transcribe (in place) Adds new methods to the Seq object: - transcribe - back_transcribe - translate (like the python string method) - translate_all (Biological translation) - translate_to_stop (Biological translation up to and excluding first stop codon) - translate_cds (Biological translation with an initial start codon as M, up to and excluding the first stop codon) I think this would be enough to deprecate Bio.Translate and Bio.Transcribe (after the next release). Comments welcome - for example are these method names sensible? Also, should the MutableSeq methods all act "in situ"? What about translation methods for MutableSeq objects? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 28 11:18:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Dec 2007 11:18:54 -0500 Subject: [Biopython-dev] [Bug 2425] New: Fasta ID parsing error Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2425 Summary: Fasta ID parsing error Product: Biopython Version: 1.44 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: dtomso at athenixcorp.com Loader.py will give an error as follows when presented with an unusual FASTA header line: >region1.fasta.screen.Contig1 ACAGGATAGGCGGGAGCCATTGAAACCGGAGCGCTAGCTTCGGTGGAGGC GCTGGTGGGATACCGCCCTGACTGTATTGAAATTCTAACCTACGGGTCTT Traceback (most recent call last): File "biosql_driver.py", line 28, in db.load(SeqIO.parse(sfile, 'fasta')) File "/home/dtomso/repository/biopython/build/lib.linux-i686-2.5/BioSQL/BioSeqDatabase.py", line 412, in load db_loader.load_seqrecord(cur_record) File "/usr/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/usr/lib/python2.5/site-packages/BioSQL/Loader.py", line 214, in _load_bioentry_table accession, version = record.id.split('.') ValueError: too many values to unpack It appears to be looking for any '.' in the file, assuming that is a version number, and splitting to obtain that number. However, this only works on NCBI-type header lines. Files that deviate from this (e.g. those produced by phrap, which produced the file above) cause this issue. I bolted on an inelegant fix by having the code check for multiple '.' characters, in which case the version defaults to zero. Other solutions may be preferable. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 1 18:25:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Dec 2007 13:25:56 -0500 Subject: [Biopython-dev] [Bug 2414] New: run_tests, py fails with a single test on a test suite Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2414 Summary: run_tests,py fails with a single test on a test suite Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: trivial Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: tiagoantao at gmail.com When a test python file is composed of a single test, PyUnit dumps the following log: Ran 1 test in xxxs run_test.py on (current CVS HEAD) line 284 is only searching for the plural Run yy tests in xxxs Mini patch (not tested, but trivial) if expected_line[:3] == "Ran" and \ string.find(expected_line, " tests in ") >= 5: becomes, eg, if expected_line[:3] == "Ran" and \ (string.find(expected_line, " tests in ") >= 5 or string.find(expected_line, " test in ") >= 5): I actually have, for now, a single case with one test, as I split my test cases in depending on external binaries and not depending on external binaries (creating a test scenario with a single test to try to run an external application) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Mon Dec 3 21:41:06 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 3 Dec 2007 21:41:06 +0000 Subject: [Biopython-dev] [Bug 2414] New: run_tests, py fails with a single test on a test suite In-Reply-To: References: Message-ID: <6d941f120712031341p1af6ca55oa04b787f8e0937@mail.gmail.com> Hi, Could I please ask you (I suppose Peter or Michiel) to advise on this? I have my code for coalescent simulation ready, but I am not committing because one of my test files has only a single test (to see if it can run the coalescent simulator, all other tests are non-dependent on having the simulator, so are on a different test case). I can either put a dummy test just to have 2 tests (hack around), or run_test can be sorted out. Thanks Tiago PS - Apologies in advance if I take too much time to respond, I will be traveling for the next 3 days. On Dec 1, 2007 6:25 PM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2414 > > Summary: run_tests,py fails with a single test on a test suite > Product: Biopython > Version: Not Applicable > Platform: All > OS/Version: All > Status: NEW > Severity: trivial > Priority: P2 > Component: Main Distribution > AssignedTo: biopython-dev at biopython.org > ReportedBy: tiagoantao at gmail.com > > > When a test python file is composed of a single test, PyUnit dumps the > following log: > Ran 1 test in xxxs > run_test.py on (current CVS HEAD) line 284 is only searching for the plural > Run yy tests in xxxs > Mini patch (not tested, but trivial) > if expected_line[:3] == "Ran" and \ > string.find(expected_line, " tests in ") >= 5: > becomes, eg, > if expected_line[:3] == "Ran" and \ > (string.find(expected_line, " tests in ") >= 5 or > string.find(expected_line, " test in ") >= 5): > > I actually have, for now, a single case with one test, as I split my test cases > in depending on external binaries and not depending on external binaries > (creating a test scenario with a single test to try to run an external > application) > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- http://www.tiago.org/ps From bugzilla-daemon at portal.open-bio.org Mon Dec 3 22:04:05 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Dec 2007 17:04:05 -0500 Subject: [Biopython-dev] [Bug 2414] run_tests, py fails with a single test on a test suite In-Reply-To: Message-ID: <200712032204.lB3M45tn000935@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2414 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-03 17:04 EST ------- Are you talking about test_PopGen_FDist.py? I don't have fdist installed, so I haven't found this problem yet... In anycase, your fix looks fine, although arguably a regular expession (with an optional "s" in "tests") would be more elegant. I am happy for you to make this change in run_tests.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Tue Dec 4 07:10:40 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 4 Dec 2007 02:10:40 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> Hi everybody, I am still looking at the different code in Biopython to access SwissProt. With Bio.SwissProt, we can access the SwissProt database as follows: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> record = dictionary["O23719"] # record is now a string containing the SwissProt record O23719 Another option is to pull out a Bio.SwissProt.SProt.Record object: >>> from Bio.SwissProt import SProt >>> s_parser = SProt.RecordParser() >>> dictionary = SProt.ExPASyDictionary(parser=s_parser) >>> record = dictionary["O23719"] # record is now a Bio.SwissProt.SProt.Record object containing record O23719 A third option is to pull out a SeqRecord by using SeqIO: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> record = dictionary["O23719"] >>> from Bio import SeqIO >>> import StringIO >>> record = SeqIO.parse(StringIO.StringIO(record), "swiss").next() # record is now a Bio.SeqRecord.SeqRecord object containing record O23719 Compare this to how we would read a Fasta file: >>> from Bio import SeqIO >>> input = open("mydata.fa") >>> record = SeqIO.parse(input, "fasta").next() For consistency with Bio.SeqIO, it would make sense if ExPASyDictionary would returns handles instead of parsed objects. Then these examples look like: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> record = dictionary["O23719"].read() # record is now a string containing the SwissProt record O23719 To pull out a Bio.SwissProt.SProt.Record object: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> handle = dictionary["O23719"] >>> record = SProt.parse(handle) # record is now a Bio.SwissProt.SProt.Record object containing record O23719 To pull out a SeqRecord by using SeqIO: >>> from Bio.SwissProt import SProt >>> dictionary = SProt.ExPASyDictionary() >>> handle = dictionary["O23719"] >>> from Bio import SeqIO >>> record = SeqIO.parse(handle, "swiss").next() # record is now a Bio.SeqRecord.SeqRecord object containing record O23719 *If* we decide that ExPASyDictionary should return handles, *then* actually we don't really need an ExPASyDictionary, as its behavior is then largely the same as Bio.WWW.ExPASy.get_sprot_raw. So in short, in my opinion Bio.SwissProt.SProt.ExPASyDictionary does not add much beyond what Bio.WWW.ExPASy.get_sprot_raw already offers. Any comments? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython-dev at maubp.freeserve.co.uk Tue Dec 4 10:26:52 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 4 Dec 2007 10:26:52 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> > For consistency with Bio.SeqIO, it would make sense if ExPASyDictionary would > returns handles instead of parsed objects. I agree that it would in general be simpler if our online APIs returned handles by default. This also applies to the Bio.GenBank methods. Of course, we should preserve existing functionality if possible. Another alternative is to return SeqRecords by default (via Bio.SeqIO) but this wouldn't generalise to non-sequence files like ProSite etc. One idea I had been thinking about was adding a new function Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as a proxy to all our supported online sequence databases, and either return a handle to the requested record(s), or perhaps return SeqRecord(s). One API model would be that outlined for the (possibly defunct?) Open Biological Database Access (OBDA) scheme, which covers both BioSQL access and online fetching (biofetch): http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/biofetch/biofetch.txt?cvsroot=obf-common But first I should probably finish working on BioSQL ;) > *If* we decide that ExPASyDictionary should return handles, *then* actually > we don't really need an ExPASyDictionary, as its behavior is then largely the > same as Bio.WWW.ExPASy.get_sprot_raw. So in short, in my opinion > Bio.SwissProt.SProt.ExPASyDictionary does not add much beyond what > Bio.WWW.ExPASy.get_sprot_raw already offers. Can ExPASyDictionary return anything that get_sprot_raw can't? Otherwise from the user's point of view its just a coding style issue (dictionary versus function). Peter From bugzilla-daemon at portal.open-bio.org Tue Dec 4 10:41:25 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Dec 2007 05:41:25 -0500 Subject: [Biopython-dev] [Bug 2414] run_tests, py fails with a single test on a test suite In-Reply-To: Message-ID: <200712041041.lB4AfPTN008806@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2414 tiagoantao at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from tiagoantao at gmail.com 2007-12-04 05:41 EST ------- > Are you talking about test_PopGen_FDist.py? I don't have fdist installed, so I > haven't found this problem yet... No, it is my new SimCoal code. > In anycase, your fix looks fine, although arguably a regular expession (with an > optional "s" in "tests") would be more elegant. > > I am happy for you to make this change in run_tests.py OK, I will do this with a regex. I cannot promise when though, as I am traveling until Saturday (but it will before next Monday). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 4 19:43:17 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Dec 2007 14:43:17 -0500 Subject: [Biopython-dev] [Bug 2412] NCBIXML. fails parsing with blast 2.2.15 in special cases (Karlin-Altschul) In-Reply-To: Message-ID: <200712041943.lB4JhHkE012059@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2412 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-04 14:43 EST ------- The fact that your example gives an empty XML file is essentially due to some problem with Blast. I agree that the Biopython error message you quoted is very unhelpful in this situation. Are you using Biopython 1.43 (as suggested by the strack trace in the error report), or Biopython 1.44 as reported in the bug details? What does this do on your setup? from StringIO import StringIO from Bio.Blast import NCBIXML handle = StringIO("") for record in NCBIXML.parse(handle) : print record If you are using Biopython 1.44 or later you should get a helpful error message, "ValueError: Your XML file was empty". You can catch this, and inspect the contents of the error handle if you want to deal with this in your application. i.e. I think this bug has already been fixed in Biopython 1.44 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 4 20:25:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Dec 2007 15:25:45 -0500 Subject: [Biopython-dev] [Bug 2396] BioSQL loader does not store sequence level annotations dict In-Reply-To: Message-ID: <200712042025.lB4KPj2D016252@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2396 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-04 15:25 EST ------- I think I have fixed this now in CVS. One related wrinkle is that if you had this: record.annotations["example1"] == "string" record.annotations["example2"] == ["alpha"] record.annotations["example3"] == ["alpha", "beta"] after loading and retreiving from BioSQL you have this: record.annotations["example1"] == ["string"] record.annotations["example2"] == ["alpha"] record.annotations["example3"] == ["alpha", "beta"] i.e. Everything becomes a list of strings. It is difficult to see how to deal with this elegantly given the current BioSQL schema. One option is to treat single entries as either a list or a string depending on the rank field in the database... I should probably take this up with the BioSQL mailing list to see how/if this issue affects BioPerl/BioJava. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Wed Dec 5 01:13:01 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 4 Dec 2007 20:13:01 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> > One idea I had been thinking about was adding a new function > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > a proxy to all our supported online sequence databases, and either > return a handle to the requested record(s), or perhaps return > SeqRecord(s). I believe that Bio.db has such a functionality, but I don't think it is used much. Anyway, we currently have too many functions in Biopython to access databases rather than too few. So I think we should not add any new ones. > > *If* we decide that ExPASyDictionary should return handles, *then* actually > > we don't really need an ExPASyDictionary, as its behavior is then largely the > > same as Bio.WWW.ExPASy.get_sprot_raw. So in short, in my opinion > > Bio.SwissProt.SProt.ExPASyDictionary does not add much beyond what > > Bio.WWW.ExPASy.get_sprot_raw already offers. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > Otherwise from the user's point of view its just a coding style issue > (dictionary versus function). ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can return any record that ExPASyDictionary can return. There are two differences between the two: 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As you write, this is just a coding style issue. 2) When creating a ExPASyDictionary, users can pass a parser to parse the records before returning them. This is in essence only a coding style issue. In particular, do we want: >>> from Bio.SwissProt import SProt >>> sprot_parser = SProt.RecordParser() >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) >>> record = dictionary["O12345"] or >>> from Bio.SwissProt import SProt >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SProt.parse(handle) For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. For plain-text output, in the ExPASyDictionary approach we pass no parser, and in the get_sprot_raw approach we call read() on the handle directly. To get a handle, in the ExPASyDictionary approach we can use StringIO to convert the text output to a handle; in the get_sprot_raw approach we don't need to do anything. In my opinion, both 1) and 2) are just coding style issues. Maintaining both ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes confusion for users. So I suggest we focus on one of these, and deprecate the other. The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is organized, and therefore has my preference. Two more issues: 1) I am not sure why the SwissProt code is kept in a separate SProt submodule of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can save ourselves some typing by keeping all the SwissProt code there instead of in SProt.py. 2) A SwissProt.parse function currently doesn't exist. Right now it is a three-step process: >>> s_parser = SProt.RecordParser() >>> s_iterator = SProt.Iterator(handle, s_parser) >>> record = s_iterator.next() A SwissProt.parse function would just contain these three steps, or perhaps only the first two. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Wed Dec 5 01:13:01 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 4 Dec 2007 20:13:01 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> > One idea I had been thinking about was adding a new function > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > a proxy to all our supported online sequence databases, and either > return a handle to the requested record(s), or perhaps return > SeqRecord(s). I believe that Bio.db has such a functionality, but I don't think it is used much. Anyway, we currently have too many functions in Biopython to access databases rather than too few. So I think we should not add any new ones. > > *If* we decide that ExPASyDictionary should return handles, *then* actually > > we don't really need an ExPASyDictionary, as its behavior is then largely the > > same as Bio.WWW.ExPASy.get_sprot_raw. So in short, in my opinion > > Bio.SwissProt.SProt.ExPASyDictionary does not add much beyond what > > Bio.WWW.ExPASy.get_sprot_raw already offers. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > Otherwise from the user's point of view its just a coding style issue > (dictionary versus function). ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can return any record that ExPASyDictionary can return. There are two differences between the two: 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As you write, this is just a coding style issue. 2) When creating a ExPASyDictionary, users can pass a parser to parse the records before returning them. This is in essence only a coding style issue. In particular, do we want: >>> from Bio.SwissProt import SProt >>> sprot_parser = SProt.RecordParser() >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) >>> record = dictionary["O12345"] or >>> from Bio.SwissProt import SProt >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SProt.parse(handle) For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. For plain-text output, in the ExPASyDictionary approach we pass no parser, and in the get_sprot_raw approach we call read() on the handle directly. To get a handle, in the ExPASyDictionary approach we can use StringIO to convert the text output to a handle; in the get_sprot_raw approach we don't need to do anything. In my opinion, both 1) and 2) are just coding style issues. Maintaining both ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes confusion for users. So I suggest we focus on one of these, and deprecate the other. The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is organized, and therefore has my preference. Two more issues: 1) I am not sure why the SwissProt code is kept in a separate SProt submodule of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can save ourselves some typing by keeping all the SwissProt code there instead of in SProt.py. 2) A SwissProt.parse function currently doesn't exist. Right now it is a three-step process: >>> s_parser = SProt.RecordParser() >>> s_iterator = SProt.Iterator(handle, s_parser) >>> record = s_iterator.next() A SwissProt.parse function would just contain these three steps, or perhaps only the first two. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4451 bytes Desc: not available URL: From biopython-dev at maubp.freeserve.co.uk Wed Dec 5 10:03:34 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 5 Dec 2007 10:03:34 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00712050203p17aa38b0q15d2edd65542021d@mail.gmail.com> On 12/5/07, Michiel De Hoon wrote: > > One idea I had been thinking about was adding a new function > > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > > a proxy to all our supported online sequence databases, and either > > return a handle to the requested record(s), or perhaps return > > SeqRecord(s). > > I believe that Bio.db has such a functionality, but I don't think it is used > much. Anyway, we currently have too many functions in Biopython to > access databases rather than too few. So I think we should not add any > new ones. Certainly before taking my suggestion seriously we should try and take stock of where we stand at the moment with respect to online databases. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > > Otherwise from the user's point of view its just a coding style issue > > (dictionary versus function). > > ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can > return any record that ExPASyDictionary can return. > There are two differences between the two: > 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As > you write, this is just a coding style issue. > 2) When creating a ExPASyDictionary, users can pass a parser to parse the > records before returning them. This is in essence only a coding style issue. > In particular, do we want: > >>> from Bio.SwissProt import SProt > >>> sprot_parser = SProt.RecordParser() > >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) > >>> record = dictionary["O12345"] > or > >>> from Bio.SwissProt import SProt > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SProt.parse(handle) Or do we want to encourage Bio.SeqIO (which happens to call Bio.SwissProt.SProt internally)? >>> from Bio SeqIO >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SeqIO.parse(handle, "swiss") This is the style I prefer (and is very similar to the related examples I added to the tutorial). It separates fetching the data (as a handle) and parsing it (via SeqIO). > For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, > in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. > For plain-text output, in the ExPASyDictionary approach we pass no parser, > and in the get_sprot_raw approach we call read() on the handle directly. > To get a handle, in the ExPASyDictionary approach we can use StringIO to > convert the text output to a handle; in the get_sprot_raw approach we don't > need to do anything. > > In my opinion, both 1) and 2) are just coding style issues. Maintaining both > ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes > confusion for users. So I suggest we focus on one of these, and deprecate the > other. As ExPASyDictionary just calls wraps get_sprot_raw with a parser object, the additional overhead is minimal. The dictionary metaphore is quite nice - even if you don't actually gain much functionality. However, setting up the dictionary as it is now (requiring an "old fashioned" parser object) is fairly fiddly/confusing. > The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is > organized, and therefore has my preference. I would agree if you wanted to depreceate one, I would keep get_sprot_raw and drop ExPASyDictionary. However we should try and have a coherent API for the other online tools as well. > Two more issues: > 1) I am not sure why the SwissProt code is kept in a separate SProt submodule > of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can > save ourselves some typing by keeping all the SwissProt code there instead of > in SProt.py. Or just encourage using it via Bio.SeqIO (then we can moving things later if wanted) > 2) A SwissProt.parse function currently doesn't exist. Right now it is a > three-step process: > >>> s_parser = SProt.RecordParser() > >>> s_iterator = SProt.Iterator(handle, s_parser) > >>> record = s_iterator.next() > A SwissProt.parse function would just contain these three steps, or > perhaps only the first two. The Bio.SeqIO.parse() is very close though. Peter From biopython-dev at maubp.freeserve.co.uk Wed Dec 5 10:03:34 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 5 Dec 2007 10:03:34 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00712050203p17aa38b0q15d2edd65542021d@mail.gmail.com> On 12/5/07, Michiel De Hoon wrote: > > One idea I had been thinking about was adding a new function > > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > > a proxy to all our supported online sequence databases, and either > > return a handle to the requested record(s), or perhaps return > > SeqRecord(s). > > I believe that Bio.db has such a functionality, but I don't think it is used > much. Anyway, we currently have too many functions in Biopython to > access databases rather than too few. So I think we should not add any > new ones. Certainly before taking my suggestion seriously we should try and take stock of where we stand at the moment with respect to online databases. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > > Otherwise from the user's point of view its just a coding style issue > > (dictionary versus function). > > ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can > return any record that ExPASyDictionary can return. > There are two differences between the two: > 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As > you write, this is just a coding style issue. > 2) When creating a ExPASyDictionary, users can pass a parser to parse the > records before returning them. This is in essence only a coding style issue. > In particular, do we want: > >>> from Bio.SwissProt import SProt > >>> sprot_parser = SProt.RecordParser() > >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) > >>> record = dictionary["O12345"] > or > >>> from Bio.SwissProt import SProt > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SProt.parse(handle) Or do we want to encourage Bio.SeqIO (which happens to call Bio.SwissProt.SProt internally)? >>> from Bio SeqIO >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SeqIO.parse(handle, "swiss") This is the style I prefer (and is very similar to the related examples I added to the tutorial). It separates fetching the data (as a handle) and parsing it (via SeqIO). > For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, > in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. > For plain-text output, in the ExPASyDictionary approach we pass no parser, > and in the get_sprot_raw approach we call read() on the handle directly. > To get a handle, in the ExPASyDictionary approach we can use StringIO to > convert the text output to a handle; in the get_sprot_raw approach we don't > need to do anything. > > In my opinion, both 1) and 2) are just coding style issues. Maintaining both > ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes > confusion for users. So I suggest we focus on one of these, and deprecate the > other. As ExPASyDictionary just calls wraps get_sprot_raw with a parser object, the additional overhead is minimal. The dictionary metaphore is quite nice - even if you don't actually gain much functionality. However, setting up the dictionary as it is now (requiring an "old fashioned" parser object) is fairly fiddly/confusing. > The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is > organized, and therefore has my preference. I would agree if you wanted to depreceate one, I would keep get_sprot_raw and drop ExPASyDictionary. However we should try and have a coherent API for the other online tools as well. > Two more issues: > 1) I am not sure why the SwissProt code is kept in a separate SProt submodule > of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can > save ourselves some typing by keeping all the SwissProt code there instead of > in SProt.py. Or just encourage using it via Bio.SeqIO (then we can moving things later if wanted) > 2) A SwissProt.parse function currently doesn't exist. Right now it is a > three-step process: > >>> s_parser = SProt.RecordParser() > >>> s_iterator = SProt.Iterator(handle, s_parser) > >>> record = s_iterator.next() > A SwissProt.parse function would just contain these three steps, or > perhaps only the first two. The Bio.SeqIO.parse() is very close though. Peter From mdehoon at c2b2.columbia.edu Wed Dec 5 10:29:38 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 5 Dec 2007 05:29:38 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu><320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com><6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <320fb6e00712050203p17aa38b0q15d2edd65542021d@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B671@mail2.exch.c2b2.columbia.edu> > Or do we want to encourage Bio.SeqIO (which happens to call > Bio.SwissProt.SProt internally)? > > >>> from Bio SeqIO > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SeqIO.parse(handle, "swiss") > > This is the style I prefer (and is very similar to the related > examples I added to the tutorial). It separates fetching the data (as > a handle) and parsing it (via SeqIO). SeqIO.parse returns a SeqRecord; a SwissProt.parse returns a SwissProt.SProt.Record. Does the SeqRecord contain the same information as a SwissProt.SProt.Record? Or is some information lost? If they contain the same information, then I am in favor of encouraging Bio.SeqIO. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Wed Dec 5 10:29:38 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 5 Dec 2007 05:29:38 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu><320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com><6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <320fb6e00712050203p17aa38b0q15d2edd65542021d@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B671@mail2.exch.c2b2.columbia.edu> > Or do we want to encourage Bio.SeqIO (which happens to call > Bio.SwissProt.SProt internally)? > > >>> from Bio SeqIO > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SeqIO.parse(handle, "swiss") > > This is the style I prefer (and is very similar to the related > examples I added to the tutorial). It separates fetching the data (as > a handle) and parsing it (via SeqIO). SeqIO.parse returns a SeqRecord; a SwissProt.parse returns a SwissProt.SProt.Record. Does the SeqRecord contain the same information as a SwissProt.SProt.Record? Or is some information lost? If they contain the same information, then I am in favor of encouraging Bio.SeqIO. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Wed Dec 5 11:55:45 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 05 Dec 2007 11:55:45 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt / Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> Message-ID: <475691C1.3020705@maubp.freeserve.co.uk> On 12/5/07, Michiel De Hoon wrote: > > One idea I had been thinking about was adding a new function > > Bio.SeqIO.fetch(...) or Bio.SeqIO.online_fetch(...) which would act as > > a proxy to all our supported online sequence databases, and either > > return a handle to the requested record(s), or perhaps return > > SeqRecord(s). > > I believe that Bio.db has such a functionality, but I don't think it is used > much. Anyway, we currently have too many functions in Biopython to > access databases rather than too few. So I think we should not add any > new ones. Certainly before taking my suggestion seriously we should try and take stock of where we stand at the moment with respect to online databases. > > Can ExPASyDictionary return anything that get_sprot_raw can't? > > Otherwise from the user's point of view its just a coding style issue > > (dictionary versus function). > > ExPASyDictionary is just a wrapper around get_sprot_raw, so get_sprot_raw can > return any record that ExPASyDictionary can return. > There are two differences between the two: > 1) ExPASyDictionary behaves as a dictionary, get_sprot_raw as a function. As > you write, this is just a coding style issue. > 2) When creating a ExPASyDictionary, users can pass a parser to parse the > records before returning them. This is in essence only a coding style issue. > In particular, do we want: > >>> from Bio.SwissProt import SProt > >>> sprot_parser = SProt.RecordParser() > >>> dictionary = SProt.ExPASyDictionary(parser = sprot_parser) > >>> record = dictionary["O12345"] > or > >>> from Bio.SwissProt import SProt > >>> from Bio import ExPASy > >>> handle = ExPASy.get_sprot_raw("O12345") > >>> record = SProt.parse(handle) Or do we want to encourage Bio.SeqIO (which happens to call Bio.SwissProt.SProt internally)? >>> from Bio SeqIO >>> from Bio import ExPASy >>> handle = ExPASy.get_sprot_raw("O12345") >>> record = SeqIO.parse(handle, "swiss") This is the style I prefer (and is very similar to the related examples I added to the tutorial). It separates fetching the data (as a handle) and parsing it (via SeqIO). > For SeqRecords, in the ExPASyDictionary approach we'd use a different parser, > in the get_sprot_raw approach we call SeqIO.parse instead of SProt.parse. > For plain-text output, in the ExPASyDictionary approach we pass no parser, > and in the get_sprot_raw approach we call read() on the handle directly. > To get a handle, in the ExPASyDictionary approach we can use StringIO to > convert the text output to a handle; in the get_sprot_raw approach we don't > need to do anything. > > In my opinion, both 1) and 2) are just coding style issues. Maintaining both > ExPASyDictionary and get_sprot_raw is a burden for the developers, and causes > confusion for users. So I suggest we focus on one of these, and deprecate the > other. As ExPASyDictionary just calls wraps get_sprot_raw with a parser object, the additional overhead is minimal. The dictionary metaphore is quite nice - even if you don't actually gain much functionality. However, setting up the dictionary as it is now (requiring an "old fashioned" parser object) is fairly fiddly/confusing. > The ExPASy.get_sprot_raw approach seems closer to how Bio.SeqIO is > organized, and therefore has my preference. I would agree if you wanted to depreceate one, I would keep get_sprot_raw and drop ExPASyDictionary. However we should try and have a coherent API for the other online tools as well. > Two more issues: > 1) I am not sure why the SwissProt code is kept in a separate SProt submodule > of Bio.SwissProt. Currently, Bio/SwissProt/__init__.py is empty, so we can > save ourselves some typing by keeping all the SwissProt code there instead of > in SProt.py. Or just encourage using it via Bio.SeqIO (then we can moving things later if wanted) > 2) A SwissProt.parse function currently doesn't exist. Right now it is a > three-step process: > >>> s_parser = SProt.RecordParser() > >>> s_iterator = SProt.Iterator(handle, s_parser) > >>> record = s_iterator.next() > A SwissProt.parse function would just contain these three steps, or > perhaps only the first two. The Bio.SeqIO.parse() is very close though. Peter From mdehoon at c2b2.columbia.edu Fri Dec 7 10:11:33 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 7 Dec 2007 05:11:33 -0500 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt /Bio.SeqIO References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <475691C1.3020705@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B673@mail2.exch.c2b2.columbia.edu> Hi everybody, To summarize, I rewrote the chapter on SwissProt/Prosite/Prodoc/ExPASy and put it here: http://biopython.org/DIST/docs/tutorial/Tutorial-proposal.html#htoc51 (chapter 6 in the tutorial) This is merely a proposal on how this should work; none of this is in CVS yet. Please let us know if you have any objections. If there are no objections, I can upload the new code to CVS. That would conclude my work on Bio.WWW.ExPASy; the final (and biggest) part of my work on Bio.WWW will be to look at the various Biopython modules to interact with NCBI (Genbank, EUtils). Two comments: 1) In this proposal, I am using SwissProt.parse instead of SeqIO.parse since the latter does not (yet) store all information contained in a SwissProt file. I'd be happy though to move to SeqIO.parse for SwissProt also once it does. 2) It may be nice to have a SwissProt.read and SeqIO.read to read and return exactly one record from the handle, in addition to parse() to create an iterator to read multiple records. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3662 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Fri Dec 7 10:46:32 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Dec 2007 10:46:32 +0000 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt /Bio.SeqIO In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B673@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <475691C1.3020705@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B673@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00712070246g53e8096ew156f4502791bce9b@mail.gmail.com> > To summarize, I rewrote the chapter on SwissProt/Prosite/Prodoc/ExPASy and > put it here: > > http://biopython.org/DIST/docs/tutorial/Tutorial-proposal.html#htoc51 > (chapter 6 in the tutorial) > > This is merely a proposal on how this should work; none of this is in CVS > yet. Please let us know if you have any objections. I would add a note saying doing it this way gives Bio.SwissProt.SProt.Record objects, while you could alternatively get SeqRecord objects as described in the SeqIO chapter (use a reference). > If there are no objections, I can upload the new code to CVS. That would > conclude my work on Bio.WWW.ExPASy; the final (and biggest) part of my work > on Bio.WWW will be to look at the various Biopython modules to interact with > NCBI (Genbank, EUtils). That will be "fun"! > Two comments: > 1) In this proposal, I am using SwissProt.parse instead of SeqIO.parse since > the latter does not (yet) store all information contained in a SwissProt > file. I'd be happy though to move to SeqIO.parse for SwissProt also once it > does. > 2) It may be nice to have a SwissProt.read and SeqIO.read to read and return > exactly one record from the handle, in addition to parse() to create an > iterator to read multiple records. I'd suggested a Bio.SeqIO function, with a name like parse1() or parse_sole() etc which would return a single SeqRecord - and raise an error if the handle didn't contain one and only one record. We could call this function read() if you prefer. Peter From mdehoon at c2b2.columbia.edu Sat Dec 8 03:18:09 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 08 Dec 2007 12:18:09 +0900 Subject: [Biopython-dev] Accessing ExPASy through Bio.SwissProt /Bio.SeqIO In-Reply-To: <320fb6e00712070246g53e8096ew156f4502791bce9b@mail.gmail.com> References: <6243BAA9F5E0D24DA41B27997D1FD14402B66F@mail2.exch.c2b2.columbia.edu> <320fb6e00712040226o7ecda7e2g9fb124b3a52de026@mail.gmail.com> <6243BAA9F5E0D24DA41B27997D1FD14402B670@mail2.exch.c2b2.columbia.edu> <475691C1.3020705@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B673@mail2.exch.c2b2.columbia.edu> <320fb6e00712070246g53e8096ew156f4502791bce9b@mail.gmail.com> Message-ID: <475A0CF1.1080802@c2b2.columbia.edu> Peter wrote: > I would add a note saying doing it this way gives > Bio.SwissProt.SProt.Record objects, > while you could alternatively get SeqRecord objects as described in > the SeqIO chapter > (use a reference). OK I will add that. > > I'd suggested a Bio.SeqIO function, with a name like parse1() or > parse_sole() etc which > would return a single SeqRecord - and raise an error if the handle > didn't contain one > and only one record. We could call this function read() if you prefer. > I'd prefer read() instead of parse1(), parse_sole() etc. for the following reasons: 1) Having two names that are clearly different emphasizes the fact that they return different things (parse() returns an iterator, read() a record). 2) Some modules deal with data that always consist of one record (for example, gene expression data in case of Bio.Cluster). Such modules can have a read() function but not a parse(). It would feel strange if a module has a parse1() function but not a parse(). --Michiel. From bugzilla-daemon at portal.open-bio.org Sat Dec 8 13:09:00 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 8 Dec 2007 08:09:00 -0500 Subject: [Biopython-dev] [Bug 2417] New: Bio.SeqIO single SeqRecord read/parse function Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2417 Summary: Bio.SeqIO single SeqRecord read/parse function Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Most sequence file format can contain a single record, and in this situation having to use an iterator returned by Bio.SeqIO.parse() can be clumsy. For example, dealing with GenBank files for bacterial genomes or chromosomes. Or, from the tutorial as of Biopython 1.44, from Bio.WWW import ExPASy from Bio import SeqIO seq_record = SeqIO.parse(ExPASy.get_sprot_raw("O23729"), "swiss").next() print seq_record.id print seq_record.seq print len(seq_record.seq) Using the iterator.next() method as above works fine, it will however silently ignore any unexpected subsequent records if present. Checking your file only has one record would require a an additional check to confirm a second .next() call fails, or another such workaround. I am proposing a new function for use with a handle containing one and only one record. This would raise an error if the handle contained no records, or if it contained more than one record. It would be defined in Bio/SeqIO/__init__.py as a simple wrapper for Bio.SeqIO.parse() Note - My proposed "read single record" function would NOT work for cases where the handle contains multiple records and you only want the first one (because I would raise an exception). I would regard this as a corner case, and catering to this risks silently ignoring unexpected second and subsequent records in other use cases. In such situations using Bio.SeqIO.parse(...).next() is advised. I had previously suggested "parse1", "parse_sole", "parse_only" - none of which are very appealing. On the dev mailing list today, Michiel has proposed "read": Michiel de Hoon wrote: > > Peter wrote: > > I'd suggested a Bio.SeqIO function, with a name like parse1() or > > parse_sole() etc which would return a single SeqRecord - and raise > > an error if the handle didn't contain one and only one record. We > > could call this function read() if you prefer. > > > I'd prefer read() instead of parse1(), parse_sole() etc. for the > following reasons: > > 1) Having two names that are clearly different emphasizes the fact that > they return different things (parse() returns an iterator, read() a record). > > 2) Some modules deal with data that always consist of one record (for > example, gene expression data in case of Bio.Cluster). Such modules can > have a read() function but not a parse(). It would feel strange if a > module has a parse1() function but not a parse(). I plan to add this functionality to Bio/SeqIO/__init__.py as a "read" function, and update the tutorial accordingly shortly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Sat Dec 8 13:10:33 2007 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 8 Dec 2007 13:10:33 +0000 Subject: [Biopython-dev] Bio.SeqIO function to read a single record Message-ID: <320fb6e00712080510k3d4e5148gb0ec332a0d745452@mail.gmail.com> Michiel de Hoon wrote: > > > > I'd suggested a Bio.SeqIO function, with a name like parse1() or > > parse_sole() etc which would return a single SeqRecord - and raise > > an error if the handle didn't contain one and only one record. We > > could call this function read() if you prefer. > > > I'd prefer read() instead of parse1(), parse_sole() etc. for the > following reasons: > > 1) Having two names that are clearly different emphasizes the fact that > they return different things (parse() returns an iterator, read() a record). > > 2) Some modules deal with data that always consist of one record (for > example, gene expression data in case of Bio.Cluster). Such modules can > have a read() function but not a parse(). It would feel strange if a > module has a parse1() function but not a parse(). OK. I've filed an enhancement bug, which I'll mention on the main mailing list, http://bugzilla.open-bio.org/show_bug.cgi?id=2417 Unless there is some negative feedback, I'll add that functionality shortly. Peter From bugzilla-daemon at portal.open-bio.org Sun Dec 9 16:24:19 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Dec 2007 11:24:19 -0500 Subject: [Biopython-dev] [Bug 2417] Bio.SeqIO single SeqRecord read/parse function In-Reply-To: Message-ID: <200712091624.lB9GOJCe025680@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2417 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-09 11:24 EST ------- Updated Bio/SeqIO/__init__.py to have include new "read" function in CVS revision 1.21 I'll do the documentation and unit tests next, before marking this as fixed. [Its not yet too late to change the name from "read" if anyone can come up with a nice clear alternative, or a strong argument against this choice] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 9 18:50:06 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Dec 2007 13:50:06 -0500 Subject: [Biopython-dev] [Bug 2417] Bio.SeqIO single SeqRecord read/parse function In-Reply-To: Message-ID: <200712091850.lB9Io6tj013469@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2417 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-09 13:50 EST ------- I've updated the tutorial, wiki and unit test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 9 19:03:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Dec 2007 14:03:28 -0500 Subject: [Biopython-dev] [Bug 2412] NCBIXML. fails parsing with blast 2.2.15 in special cases (Karlin-Altschul) In-Reply-To: Message-ID: <200712091903.lB9J3SkM014338@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2412 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WORKSFORME ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-09 14:03 EST ------- As per my comment 4, I think that in Biopython 1.44 we look for the special case of an empty XML output file and raise a ValueError. On Biopython 1.43 the error was very unhelpful. I'm marking this as "works for me". Bjoern, please reopen this bug if there is still a problem using Biopython 1.44 Thanks, Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 10 01:18:50 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Dec 2007 20:18:50 -0500 Subject: [Biopython-dev] [Bug 2418] New: SyntaxError should be ValueError Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2418 Summary: SyntaxError should be ValueError Product: Biopython Version: 1.44 Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp Biopython now has SyntaxErrors all over the place. Most if not all of these should be ValueErrors. SyntaxErrors are appropriate if there is a syntax problem in the code itself, not (as it's used in Biopython) if there is a syntax problem in an input data file. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 10 10:01:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Dec 2007 05:01:49 -0500 Subject: [Biopython-dev] [Bug 2418] SyntaxError should be ValueError In-Reply-To: Message-ID: <200712101001.lBAA1nxL011529@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2418 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-10 05:01 EST ------- That would be my fault. Should we introduce a Biopython "FormatSyntaxError" exception (as a subclass of ValueError defined in Bio/__init__.py), or just switch these to ValueError exceptions instead? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 10 12:13:16 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Dec 2007 07:13:16 -0500 Subject: [Biopython-dev] [Bug 2418] SyntaxError should be ValueError In-Reply-To: Message-ID: <200712101213.lBACDGLG022397@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2418 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-12-10 07:13 EST ------- > Should we introduce a Biopython "FormatSyntaxError" exception (as a subclass of > ValueError defined in Bio/__init__.py), or just switch these to ValueError > exceptions instead? I would stick to ValueError. The error message should be clear enough for the user to understand what the problem is. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 11 11:44:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Dec 2007 06:44:33 -0500 Subject: [Biopython-dev] [Bug 2418] SyntaxError should be ValueError In-Reply-To: Message-ID: <200712111144.lBBBiXrZ014612@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2418 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-11 06:44 EST ------- I've just fixed the Bio.SeqIO, Bio.GenBank, Bio.SwissProt and Bio.SCOP cases and their test cases. I see you've found and fixed a whole more - its clearly not just me that used the SyntaxError exception in this way. We should probably also change Bio.Medline, Bio.Prosite and Bio.Blast I think the cases in Bio.config are a little different... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 02:54:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Dec 2007 21:54:47 -0500 Subject: [Biopython-dev] [Bug 2418] SyntaxError should be ValueError In-Reply-To: Message-ID: <200712120254.lBC2slIL022573@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2418 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-12-11 21:54 EST ------- I have replaced the SyntaxErrors by ValueErrors where appropriate. The remaining SyntaxErrors, as far as I can tell, are being used correctly. Closing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 15:07:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 10:07:12 -0500 Subject: [Biopython-dev] [Bug 2419] New: SeqUtils __init__.py missing complement function (v1.43 and v1.44) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2419 Summary: SeqUtils __init__.py missing complement function (v1.43 and v1.44) Product: Biopython Version: 1.44 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: justin.t.riley at gmail.com This issue exists in both 1.43 and 1.44. You won't notice this bug on an import of SeqUtils. However, when you try to use the six_frame_translations function like so: from Bio import SeqUtils SeqUtils.six_frame_translations('GTCA....AAT') you get: : global name 'complement' is not defined at line 285 (for version 1.43 anyhow) At first I searched all the Biopython modules for a "def complement" string and found one in Seq but it was for the complement of an actual Seq object. Looking around the web I found: def complement(seq): " returns the complementary sequence (NOT antiparallel) " return ''.join([IUPACData.ambiguous_dna_complement[x] for x in seq]) Pasting the above in Bio/SeqUtils/__init__.py solved the issue for me. Thanks. ~jtriley -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 20:33:43 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 15:33:43 -0500 Subject: [Biopython-dev] [Bug 2417] Bio.SeqIO single SeqRecord read/parse function In-Reply-To: Message-ID: <200712122033.lBCKXhxd020792@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2417 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mmokrejs at ribosome.natur.cuni | |.cz -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 21:48:03 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 16:48:03 -0500 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200712122148.lBCLm3iH025664@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #19 from Biosql at hotmail.com 2007-12-12 16:48 EST ------- Hi Peter, I know it's been a very long time (more than a month), but I had this huge exam to prepare. Anyway, I've tried the latest version and everything is working fine. Many many thanks to you ! Since any Swiss Prot cross-references ain't uploaded in the Biosql DB, I've tried to parse the flat file with the RecordParser method from SProt instead of the SequenceParser or the SeqIO Parser, but I'm getting an error. I've seen in the bug list that you seem to work on this issue. Am I right ? If not, is there a way to upload the Swiss Prot cross-references ? Again, thank you ! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 22:01:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 17:01:47 -0500 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200712122201.lBCM1lGR026457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-12 17:01 EST ------- Hi Jonathan, I'm glad we've fixed the error for you. Could you be a little more precise about what isn't working with getting Swiss Prot cross-references into BioSQL? e.g. Pick a specific SwissProt record, and quote the lines from the file containing the cross-references. That should be enough for me to try and track down what's going on. By the way - if you want to work with BioSQL, you have to use SeqRecord objects (e.g. from the Bio.SeqIO parser), and not the Bio.SwissProt.SProt.Record objects. This probably explains the error you mentioned using the RecordParser parser instead. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 12 22:17:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 17:17:36 -0500 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200712122217.lBCMHaBK027220@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 ------- Comment #21 from Biosql at hotmail.com 2007-12-12 17:17 EST ------- (In reply to comment #20) > Hi Jonathan, > > I'm glad we've fixed the error for you. Could you be a little more precise > about what isn't working with getting Swiss Prot cross-references into BioSQL? > > e.g. Pick a specific SwissProt record, and quote the lines from the file > containing the cross-references. > > That should be enough for me to try and track down what's going on. > > By the way - if you want to work with BioSQL, you have to use SeqRecord objects > (e.g. from the Bio.SeqIO parser), and not the Bio.SwissProt.SProt.Record > objects. This probably explains the error you mentioned using the RecordParser > parser instead. > > Peter > Sorry for the lack of informations, Here's an example : http://ca.expasy.org/uniprot/Q9CQD1.txt All the sequences, ID line, AC lines and comments (cc lines) are being uploaded in the database, but not the : DR lines (which I consider the most interesting cross-references), the Pubmed references (R_ lines) and the Taxon of the protein. I don't think that the FT lines can be uploaded too isn't ? If so, it would be awesome ! Just to clear things, this uploading pattern is not only related to this protein (Rab5a) but for all the Swiss Prot proteins. Do you need anything else ? Jonathan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 13 00:42:28 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Dec 2007 19:42:28 -0500 Subject: [Biopython-dev] [Bug 2419] SeqUtils __init__.py missing complement function (v1.43 and v1.44) In-Reply-To: Message-ID: <200712130042.lBD0gSdm001952@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2419 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-12-12 19:42 EST ------- The "complement" and similar functions were removed from Bio.SeqUtils in Biopython 1.43 because similar functionality existed in several places in Biopython. Apparently, we missed this call to complement in the six_frame_translations function. I would like to avoid adding this function back to SeqUtils. Instead, we can use the reverse_complement function in Bio.Seq, and take its reverse. Could you double-check if the revised version of Bio.SeqUtils.__init__.py works for you? You can pick it up from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/SeqUtils/__init__.py?rev=1.14&cvsroot=biopython&content-type=text/plain -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 13 16:09:27 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Dec 2007 11:09:27 -0500 Subject: [Biopython-dev] [Bug 2419] SeqUtils __init__.py missing complement function (v1.43 and v1.44) In-Reply-To: Message-ID: <200712131609.lBDG9R7u027690@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2419 ------- Comment #2 from justin.t.riley at gmail.com 2007-12-13 11:09 EST ------- (In reply to comment #1) > The "complement" and similar functions were removed from Bio.SeqUtils in > Biopython 1.43 because similar functionality existed in several places in > Biopython. Apparently, we missed this call to complement in the > six_frame_translations function. I would like to avoid adding this function > back to SeqUtils. Instead, we can use the reverse_complement function in > Bio.Seq, and take its reverse. > > Could you double-check if the revised version of Bio.SeqUtils.__init__.py works > for you? You can pick it up from here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/SeqUtils/__init__.py?rev=1.14&cvsroot=biopython&content-type=text/plain > Michiel, I figured the "solution" I mentioned wasn't the ideal but hey it worked :D The revised __init__.py you linked to works great for me. Thanks for getting back to me so quickly with a proper fix. I'm thinking of submitting a patch to Gentoo Linux for this in their Biopython ebuild until your next release. Thanks again! ~Justin -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 00:01:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Dec 2007 19:01:54 -0500 Subject: [Biopython-dev] [Bug 2419] SeqUtils __init__.py missing complement function (v1.43 and v1.44) In-Reply-To: Message-ID: <200712140001.lBE01sIR023423@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2419 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2007-12-13 19:01 EST ------- OK, thanks. Closing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 15:17:21 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 10:17:21 -0500 Subject: [Biopython-dev] [Bug 2390] Error importing Swiss Prot in BioSQL In-Reply-To: Message-ID: <200712141517.lBEFHLcj018666@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2390 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 10:17 EST ------- Thanks for the details. Those fields are not being recorded in the SeqRecord object, so there is no way for BioSQL to put them into the database. This is bug 2235, which is on my mental to do list. Additionally, even if the parser did record the Taxon in the SeqRecord, BioSQL currently don't record this in the database. That seems to have been a short term fix for Bug 1921 which we should probably revisit. Note I'm re-marking THIS bug as fixed. Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 17:56:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 12:56:11 -0500 Subject: [Biopython-dev] [Bug 2421] New: BioSQL should store and retrieve a SeqRecord's dbxrefs Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2421 Summary: BioSQL should store and retrieve a SeqRecord's dbxrefs Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Looking over the code, BioSQL doesn't seem to even try and store database cross references in a SeqRecord's dbxrefs list. It will however store other cross references, e.g. in references and in features. See also: Bug 2390 comment 21 - Error importing Swiss Prot in BioSQL It was pointed out that SwissProt DR lines don't get into the database. The first problem was they didn't even make it to the SeqRecord... Bug 2235 - SeqRecord from Bio.SwissProt.SProt lacks annotation information The latest parser in CVS will now load DR lines into the dbxrefs list. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 18:08:01 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:08:01 -0500 Subject: [Biopython-dev] [Bug 2422] New: BioSQL shouldn't just ignore the taxon_id Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2422 Summary: BioSQL shouldn't just ignore the taxon_id Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk In Bug 1921 biopython/BioSQL/Loader.py was changed to ignore the taxon_id, in order to avoid a foreign key constraint when the taxon id was not already defined (e.g. from loading an up to date NCBI taxonomy). We should see how BioPerl and BioJava handle this situation... One crude option (which would still be an improvement on the current situation) is to check if the taxon_id is defined, and if it is, then store the record with this included, and if not, issue a warning and store the sequence but omitting the taxon id. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 18:09:33 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:09:33 -0500 Subject: [Biopython-dev] [Bug 1921] BioSeqDatabase.load() method fails In-Reply-To: Message-ID: <200712141809.lBEI9Xl9001415@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1921 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 13:09 EST ------- In resolving this issue (bug 1921), Biopython's BioSQL is simply ignoring the taxon_id, so it is never recorded in the database. I've just filed a new bug on this: Bug 2422 - BioSQL shouldn't just ignore the taxon_id -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 18:21:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:21:40 -0500 Subject: [Biopython-dev] [Bug 2422] BioSQL shouldn't just ignore the taxon_id In-Reply-To: Message-ID: <200712141821.lBEILelL002298@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2422 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 13:21 EST ------- Some of Marc Colosimo's changes proposed on Bug 1816 may be relevant here, in particular his patch "Various fixes and possible improvements" (attachment 594). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 18:34:42 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:34:42 -0500 Subject: [Biopython-dev] [Bug 1816] Error when importing GenBank file into BioSQL database In-Reply-To: Message-ID: <200712141834.lBEIYgsN004015@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1816 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 13:34 EST ------- I'd like to close this bug as the original problem seems to be fixed: Using CVS, I can load and retrieve AY243312 into BioSQL using the GenBank file downloaded from here: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=29692106 Regarding the taxon id, I've filed a separate bug: Bug 2422 - BioSQL shouldn't just ignore the taxon_id One of Marc's changes in the patch was caching term and ontology id's. Does this make a big difference? If so, could you file a new bug just for that enhancement and rescue those specific changes from the old patch. Similarly for the last_id method - could you file a new bug explaining what problem its solving. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 18:36:34 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 13:36:34 -0500 Subject: [Biopython-dev] [Bug 2414] run_tests.py fails with a single test on a test suite In-Reply-To: Message-ID: <200712141836.lBEIaYKo004243@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2414 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED Summary|run_tests,py fails with a |run_tests.py fails with a |single test on a test suite |single test on a test suite ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 13:36 EST ------- Tiago made this change in biopython/Tests/run_tests.py revision 1.12, marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 22:40:39 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 17:40:39 -0500 Subject: [Biopython-dev] [Bug 2421] BioSQL should store and retrieve a SeqRecord's dbxrefs In-Reply-To: Message-ID: <200712142240.lBEMedjA021336@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2421 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 17:40 EST ------- This seems to be working in CVS now... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 23:08:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 18:08:55 -0500 Subject: [Biopython-dev] [Bug 2410] DBSeq & DBSeqRecord should subclass Seq & SeqRecord In-Reply-To: Message-ID: <200712142308.lBEN8tWc023431@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2410 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 18:08 EST ------- Fixed in biopython/BioSQL/BioSeq.py revision 1.20 The BioSQL unit tests still pass. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 14 23:37:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Dec 2007 18:37:55 -0500 Subject: [Biopython-dev] [Bug 2421] BioSQL should store and retrieve a SeqRecord's dbxrefs In-Reply-To: Message-ID: <200712142337.lBENbtiR025242@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2421 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-14 18:37 EST ------- Fixed in CVS, and test_BioSQL_SeqIO.py updated to verify this explicitly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 15 13:47:48 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Dec 2007 08:47:48 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200712151347.lBFDlmh9019619@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #795 is|0 |1 obsolete| | ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2007-12-15 08:47 EST ------- Created an attachment (id=836) --> (http://bugzilla.open-bio.org/attachment.cgi?id=836&action=view) Patch to Bio/Seq.py [Note this does not update the test suite or the documentation, which would be needed if this is committed] Adds new methods to the MutableSeq object: - transcribe (in place) - back_transcribe (in place) Adds new methods to the Seq object: - transcribe - back_transcribe - translate (like the python string method) - translate_all (Biological translation) - translate_to_stop (Biological translation up to and excluding first stop codon) - translate_cds (Biological translation with an initial start codon as M, up to and excluding the first stop codon) I think this would be enough to deprecate Bio.Translate and Bio.Transcribe (after the next release). Comments welcome - for example are these method names sensible? Also, should the MutableSeq methods all act "in situ"? What about translation methods for MutableSeq objects? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 28 16:18:54 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Dec 2007 11:18:54 -0500 Subject: [Biopython-dev] [Bug 2425] New: Fasta ID parsing error Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2425 Summary: Fasta ID parsing error Product: Biopython Version: 1.44 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: dtomso at athenixcorp.com Loader.py will give an error as follows when presented with an unusual FASTA header line: >region1.fasta.screen.Contig1 ACAGGATAGGCGGGAGCCATTGAAACCGGAGCGCTAGCTTCGGTGGAGGC GCTGGTGGGATACCGCCCTGACTGTATTGAAATTCTAACCTACGGGTCTT Traceback (most recent call last): File "biosql_driver.py", line 28, in db.load(SeqIO.parse(sfile, 'fasta')) File "/home/dtomso/repository/biopython/build/lib.linux-i686-2.5/BioSQL/BioSeqDatabase.py", line 412, in load db_loader.load_seqrecord(cur_record) File "/usr/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/usr/lib/python2.5/site-packages/BioSQL/Loader.py", line 214, in _load_bioentry_table accession, version = record.id.split('.') ValueError: too many values to unpack It appears to be looking for any '.' in the file, assuming that is a version number, and splitting to obtain that number. However, this only works on NCBI-type header lines. Files that deviate from this (e.g. those produced by phrap, which produced the file above) cause this issue. I bolted on an inelegant fix by having the code check for multiple '.' characters, in which case the version defaults to zero. Other solutions may be preferable. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.