From mjldehoon at yahoo.com Mon May 5 10:55:42 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 5 May 2008 07:55:42 -0700 (PDT) Subject: [Biopython-dev] BOSC 2008 announcement and call for submissions Message-ID: <698765.93604.qm@web62401.mail.re1.yahoo.com> BOSC 2008 Call for Abstracts Reminder The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008). This is the final reminder to submit your proposals for talks to the BOSC submission system before May 11. Submission Process: All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php). The form will ask for a small Abstract Text to be pasted into it, and a full paper. The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details) Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom. The full-length abstract should include the title, authors, and affiliations. We prefer your abstract to be in PDF format, although plain t Important Dates: May 11: Abstract submission deadline. June 2: Notification of accepted talks. June 4: Early registration discount cut-off. July 18-19: BOSC 2008! We hope to see you at BOSC 2008! Kam Dahlquist and Darin London BOSC 2008 Co-organizers --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From bugzilla-daemon at portal.open-bio.org Wed May 7 11:36:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 May 2008 11:36:43 -0400 Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs urgent optimization In-Reply-To: Message-ID: <200805071536.m47FahTU028186@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2494 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-07 11:36 EST ------- Created an attachment (id=917) --> (http://bugzilla.open-bio.org/attachment.cgi?id=917&action=view) Patch to BioSQL/BioSeq.py Hi Eric. I've tried your script with MySQL 5.0 under Linux, and see similar example timings, e.g.: getTaxonSQLsimplex took 458.646 ms getTaxonSQL took 8152.112 ms getTaxonSQLall took 8565.304 ms getTaxonLoop took 18.612 ms However, your loop function doesn't return exactly the same list as the original code. In particular you do not exclude taxonomy lineage entries with a rank of "no rank". Also I didn't like the hard coded assumption about taxon_id 1 as a top node. What do you think of this version: def getTaxonLoopPeter(adaptor, taxon_id): # climbing up the hierarchy: bottom-up approach based on the child/parent link with parent_taxon_id taxonomy = [] while taxon_id : name, rank, parent_taxon_id = adaptor.execute_one( "SELECT taxon_name.name, taxon.node_rank, taxon.parent_taxon_id" \ " FROM taxon, taxon_name" \ " WHERE taxon.taxon_id=taxon_name.taxon_id" \ " AND taxon_name.name_class='scientific name'" \ " AND taxon.taxon_id = %s", (taxon_id,)) if taxon_id == parent_taxon_id : # If the taxon table has been populated by the BioSQL script # load_ncbi_taxonomy.pl this is how top parent nodes are stored. # Personally, I would have used a NULL parent_taxon_id here. break if rank <> "no rank" : #For consistency with older versions of Biopython, we are only #interested in taxonomy entries with a stated rank. #Add this to the start of the lineage list. taxonomy.insert(0, name) taxon_id = parent_taxon_id return taxonomy I'm attaching a patch to BioSQL/BioSeq.py that uses this code in place of the current left/right dependent version. While this does seem to be much faster in your test script, I'm not sure how much difference this will make in normal usage. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 8 07:56:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 May 2008 07:56:24 -0400 Subject: [Biopython-dev] [Bug 2496] New: Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2496 Summary: Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option Product: Biopython Version: 1.45 Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk CC: betainverse at gmail.com Problem reported on the mailing list by Katie Edmonds. We need to add the CGI option RUN_PSIBLAST to the Blast URL in order to support PSI-BLAST. However, the current Biopython code can't parse the RID from the resulting HTML which needs another fix. Patch to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 8 07:58:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 May 2008 07:58:46 -0400 Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option In-Reply-To: Message-ID: <200805081158.m48Bwkxq028674@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2496 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-08 07:58 EST ------- Created an attachment (id=918) --> (http://bugzilla.open-bio.org/attachment.cgi?id=918&action=view) Patch to Bio/Blast/NCBIWWW.py This seems to work, however there is another problem in the XML parser. e.g. from Bio.Blast.NCBIWWW import qblast #gi|160837788|ref|NP_075631.2| actin related protein 2/3 complex, subunit 1B sequence = \ "MAYHSFLVEPISCHAWNKDRTQIAICPNNHEVHIYEKSGAKWNKVHELKEHNGQVTGIDWAPESNRIVTC" \ + "GTDRNAYVWTLKGRTWKPTLVILRINRAARCVRWAPNENKFAVGSGSRVISICYFEQENDWWVCKHIKKP" \ + "IRSTVLSLDWHPNNVLLAAGSCDFKCRIFSAYIKEVEERPAPTPWGSKMPFGELMFESSSSCGWVHGVCF" \ + "SASGSRVAWVSHDSTVCLVDADKKMAVATLASETLPLLAVTFITENSLVAAGHDCFPVLFTYDNAAVTLS" \ + "FGGRLDVPKQSSQRGMTARERFQNLDKKASSEGGAATGAGLDSLHKNSVSQISVLSGGKAKCSQFCTTGM" \ + "DGGMSIWDVKSLESALKDLKIK" result_handle1 = qblast('blastp', 'nr', sequence, expect=0.001) result_handle2 = qblast('blastp', 'nr', sequence, i_thresh=0.05, expect=10, run_psiblast="on") -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 8 10:28:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 May 2008 10:28:21 -0400 Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option In-Reply-To: Message-ID: <200805081428.m48ESLbe006861@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2496 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-08 10:28 EST ------- This patch seems to be working - note that you will also need to update Bio/Blast/NCBIXML.py to CVS revision 1.18 in order to parse the results. This is due to a small change in the formatting of the version number in the latest XML output. I would like someone familiar with PSI-Blast to confirm this is OK before I commit this change to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 9 05:01:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 May 2008 05:01:46 -0400 Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option In-Reply-To: Message-ID: <200805090901.m4991kut017980@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2496 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-09 05:01 EST ------- Katie has reported back via the mailing list that there are still issues with multiple PSI-Blast iterations, see: http://lists.open-bio.org/pipermail/biopython/2008-May/004220.html See also the original thread: http://lists.open-bio.org/pipermail/biopython/2008-May/004213.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 9 07:21:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 May 2008 07:21:33 -0400 Subject: [Biopython-dev] [Bug 2497] New: Unit tests do not cover Bio.Blast.NCBIWWW.qblast() Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2497 Summary: Unit tests do not cover Bio.Blast.NCBIWWW.qblast() Product: Biopython Version: 1.45 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Recent NCBI changes to use BLAST 2.2.18+ with their online API broke our XML parser. This was actually reported via the mailing list and fixed quickly. Adding an online unit test to explicitly run a few queries with Bio.Blast.NCBIWWW.qblast() and parse the XML output could have caught this earlier. I'm going to attach a proposed additional unit test to do this, test_NCBIWWW_online.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 9 07:24:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 May 2008 07:24:48 -0400 Subject: [Biopython-dev] [Bug 2497] Unit tests do not cover Bio.Blast.NCBIWWW.qblast() In-Reply-To: Message-ID: <200805091124.m49BOmUD023507@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2497 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-09 07:24 EST ------- Created an attachment (id=919) --> (http://bugzilla.open-bio.org/attachment.cgi?id=919&action=view) Addition unit test This is a simple unit test which calls qblast() twice, once using blastp and once using blastn. The XML results are then parsed, and it checks that a few pre-defined expected matches are found. There is minimal output to the console/output file as I do not want minor details like the precise number of hits to be reported (anticpating these to fluctuate as the databases grow). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From quantrum75 at yahoo.com Fri May 9 09:37:05 2008 From: quantrum75 at yahoo.com (quantrum75) Date: Fri, 9 May 2008 06:37:05 -0700 (PDT) Subject: [Biopython-dev] Anyone needs help? In-Reply-To: Message-ID: <686395.82650.qm@web31404.mail.mud.yahoo.com> Hi there, I am newbie who is interested in contributing. I was wondering if anyone needed any help with a project? I have tried contributing at a few places before and the problem I ran into was that it was too long and unfocused requirements and nothing came of it in the end. What I am looking for is, 1) Something small to start off with. 2) Something I can complete within a short period of time (focused work of a day or two) and reach a definite conclusion. 3) No work is too small for me. 4) I d be willing to do any kind of grunt work and would be glad to help with documentation etc 5) Ideally, it would be something like reviewing some documentation and correcting it, or writing some documentation for a function or whatever for someone who needs to do it but just does not have the time to do it. 6) The kind of work I like to do is work that can be completed. If anyone has such a job in mind, let me know. Thanks for your time. Sincerely Regards Rama ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From biopython at maubp.freeserve.co.uk Fri May 9 11:33:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 9 May 2008 16:33:08 +0100 Subject: [Biopython-dev] Anyone needs help? In-Reply-To: <686395.82650.qm@web31404.mail.mud.yahoo.com> References: <686395.82650.qm@web31404.mail.mud.yahoo.com> Message-ID: <320fb6e00805090833w6977bb3fr6ca32d70cb2887ea@mail.gmail.com> On Fri, May 9, 2008 at 2:37 PM, Rama wrote: > Hi there, > I am newbie who is interested in contributing. I was wondering if anyone needed any help with a project? Hello Rama. What is your background? Do you know anything about bioinformatics for example? Also how experienced are you with python, and have you ever worked with the tools diff, patch and CVS? > I have tried contributing at a few places before and the problem I ran into was that it was too long > and unfocused requirements and nothing came of it in the end. What I am looking for is, > ... > If anyone has such a job in mind, let me know. I would suggest you have a go at Bug 2446, which is small and shouldn't be too complicated. The bug reporter Dave Thompson has been kind enough to provide a few test cases and example code to demonstrate the problem. http://bugzilla.open-bio.org/show_bug.cgi?id=2446 Could you try modifying the Ace parser to just ignore these comment sections? The file you need to look at is Bio/Sequencing/Ace.py http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Sequencing/Ace.py?cvsroot=biopython As you can see from the CVS history, this code hasn't changed since our latest release of Biopython 1.45, so you could work from that if its easier than learning about CVS too. If you can get this to work, then prepare a patch file against the CVS code (or Biopython 1.45) and attach it to the bug. Let me know what you think about trying this. Regards, Peter (one of the Biopython developers) From bugzilla-daemon at portal.open-bio.org Fri May 9 14:20:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 May 2008 14:20:12 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200805091820.m49IKCMh009431@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 ------- Comment #31 from mmokrejs at ribosome.natur.cuni.cz 2008-05-09 14:20 EST ------- Hi, I wanted to test what you have but lack some more user friendly documentation. Specifically, I lack documentation for the class BioSeqDatabase in BioSeqDatabase.py (attachment 915). In the method load which Eric has modified it is not clear to me what would be fetched from NCBI Taxonomy DB. I guess the full lineage, but still I do not know whether as a string or a list of strings or similarly just taxids? The Loader.py (attachment 914) has scary function called remove() and I would like to see moro elaborate explanation what it really does. Imagine I have two subspecies of same species in the database want to delete the first one. Will it zap the parents common to both of them? I wish not. ;-) Also, I am a bit surprised that _get_taxon_id() would actually modify a local database. Could there be another name of could it be split into two functions, one doing the search ove local db, and optionally fetching data via internet and second modifying local db? And, shouldn't the 'if self.fetch_NCBI_taxonomy' have a corresponding elif for the second attempt and the third one? It is a bit too long to read. ;-) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 12 14:40:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 May 2008 14:40:34 -0400 Subject: [Biopython-dev] [Bug 2499] New: Bio.Blast.NCBIXML cannot handle XML without date in BlastOutput_version Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2499 Summary: Bio.Blast.NCBIXML cannot handle XML without date in BlastOutput_version Product: Biopython Version: 1.44 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: n.j.loman at bham.ac.uk I got the following XML file directly from the NCBI website. blastp BLASTP 2.2.18+ Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch????ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. env_nr ... This output raises an exception when put through NCBIXML.parse() due to the absence of a date after the string BLASTP 2.2.18+ The following diff sorts it out: --- /home/nick/biopython/biopython-1.44/Bio/Blast/NCBIXML.py 2007-07-27 21:34:07.000000000 +0100 +++ NCBIXML.py 2008-05-12 18:01:36.000000000 +0100 @@ -212,8 +212,10 @@ Save this to put on each blast record object """ - self._header.version = self._value.split()[1] - self._header.date = self._value.split()[2][1:-1] + s = self._value.split() + self._header.version = s[1] + if len(s) > 2: + self._header.date = s[2][1:-1] def _end_BlastOutput_reference(self): """a reference to the article describing the algorithm I'm sorry, I haven't checked to see if this is fixed in 1.45. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 13 04:09:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 13 May 2008 04:09:53 -0400 Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML without date in BlastOutput_version In-Reply-To: Message-ID: <200805130809.m4D89ro7003140@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2499 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-13 04:09 EST ------- Hi Nick, This was reported earlier on the mailing list, and fixed in Bio/Blast/NCBIXML.py revision 1.18 (at the time I didn't bother to file a bug, maybe I should have): http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIXML.py?cvsroot=biopython If you need the fix urgently, you can either get the whole of Biopython from CVS and install from source, or just replace that one file which can simple be downloaded from ViewCVS (link above). Your exception error will tell you where exactly your local copy of Bio/Blast/NCBIXML.py is. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 13 05:16:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 13 May 2008 05:16:18 -0400 Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML without date in BlastOutput_version In-Reply-To: Message-ID: <200805130916.m4D9GIMV006160@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2499 ------- Comment #2 from n.j.loman at bham.ac.uk 2008-05-13 05:16 EST ------- Many thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Tue May 13 08:07:15 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 13 May 2008 05:07:15 -0700 (PDT) Subject: [Biopython-dev] Reportlab requirement Message-ID: <305778.65303.qm@web62415.mail.re1.yahoo.com> Hi everybody, Currently, setup.py checks if Reportlab is installed or not. If not, you get the following message: *** Reportlab *** is either not installed or out of date. This package is optional, which means it is only used in a few specialized modules in Biopython. You probably don't need this if you are unsure. You can ignore this requirement, and install it later if you see ImportErrors. You can find Reportlab at http://www.reportlab.org/downloads.html. Do you want to continue this installation? (Y/n) Reportlab is only used in Bio.Graphics. Unlike e.g. Numeric, Reportlab can be installed later if needed without having to rebuild Biopython. The Biopython unit tests already skip Bio.Graphics if Reportlab is not found. So I think it is sufficient to check for Reportlab presence only if a user tries to use Bio.Graphics. This will save us the "Do you want to continue this installation? (Y/n) " question above, which may scare of users (and I am quite tired of it myself, too)... Any objections? --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From sdavis2 at mail.nih.gov Tue May 13 08:34:20 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 13 May 2008 08:34:20 -0400 Subject: [Biopython-dev] Reportlab requirement In-Reply-To: <305778.65303.qm@web62415.mail.re1.yahoo.com> References: <305778.65303.qm@web62415.mail.re1.yahoo.com> Message-ID: <264855a00805130534q6451e40fj427a51e4aa729b18@mail.gmail.com> On Tue, May 13, 2008 at 8:07 AM, Michiel de Hoon wrote: > Hi everybody, > > Currently, setup.py checks if Reportlab is installed or not. If not, you get the following message: > > *** Reportlab *** is either not installed or out of date. > > This package is optional, which means it is only used in a few > specialized modules in Biopython. You probably don't need this if you > are unsure. You can ignore this requirement, and install it later if > you see ImportErrors. > You can find Reportlab at http://www.reportlab.org/downloads.html. > > Do you want to continue this installation? (Y/n) > > > Reportlab is only used in Bio.Graphics. Unlike e.g. Numeric, Reportlab can be installed later if needed without having to rebuild Biopython. The Biopython unit tests already skip Bio.Graphics if Reportlab is not found. > > So I think it is sufficient to check for Reportlab presence only if a user tries to use Bio.Graphics. This will save us the "Do you want to continue this installation? (Y/n) " question above, which may scare of users (and I am quite tired of it myself, too)... > > Any objections? I personally think it is a good idea to remove the question, yes. Sean From bugzilla-daemon at portal.open-bio.org Tue May 13 12:25:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 13 May 2008 12:25:49 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200805131625.m4DGPn3W028364@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-13 12:25 EST ------- I see some interesting parrallels for the __getitem__ options for a sequence alignment, and recent and on going discussions on the numpy discussion list for the __getitem__ behaviour of matrices versus arrays. In particular, some participants favour return of row/column vector objects in some situations. Also methods to allow iteration over rows or columns have been suggested. Here with the sequence Alignment class, we could have SeqRecords for the rows, but Seq or strings for the columns. Perhaps we should wait and see how the numpy discussion turns out? However, some of the other options discussed here on this bug are probably worth committing soon (e.g. the __str__ and __repr__ methods) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 14 16:49:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 May 2008 16:49:08 -0400 Subject: [Biopython-dev] [Bug 2500] New: should use python-numpy instead of python-num{eric, array} Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2500 Summary: should use python-numpy instead of python- num{eric,array} Product: Biopython Version: 1.45 Platform: All URL: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478457 OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mail at philipp-benner.de Both python-numeric and python-numarray do not see new upstream releases anymore; the currently maintained project is python-numpy. Please convert the package to use python-numpy instead. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 14 20:58:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 May 2008 20:58:11 -0400 Subject: [Biopython-dev] [Bug 2500] should use python-numpy instead of python-num{eric, array} In-Reply-To: Message-ID: <200805150058.m4F0wBCO023044@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2500 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2008-05-14 20:58 EST ------- *** This bug has been marked as a duplicate of bug 2251 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 14 20:58:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 May 2008 20:58:13 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200805150058.m4F0wDfd023057@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mail at philipp-benner.de ------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2008-05-14 20:58 EST ------- *** Bug 2500 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu May 15 09:04:22 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 15 May 2008 14:04:22 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com> References: <6d941f120805150248j11d0c02cq39626c304c9c7e29@mail.gmail.com> <320fb6e00805150540s50912664r462a194261c5c8c2@mail.gmail.com> <320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com> Message-ID: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> Hi all, We are trying to submit an abstract for BOSC 2008 regarding Biopython. Below is the current version. Comments would be very appreciated (we are already after the deadline, so they should come in fast ;) ). Michiel, do you want to add anything to the "future" section? --------------------------------------- Biopython Project Update Tiago Antao[1], Peter Cock[2] In this talk we present the current status of the Biopython project, we focus on features developed since BOSC 2007, future plans for the project and present example usages of the new population genetics module. The latest Biopython release is 1.45 made available on 22 March 2008. Some of the new features are: 1. A new population genetics module including support for coalescent simulation, selection detection and the GenePop file format. The new module relies on existing open source external software (e.g., the open source Simcoal2 for coalescent simulation which is can take advantage of multiple core CPUs for computationally intensive tasks). 2. Improved documentation. 3. Deprecation of many modules which were either obsolete or had been superseded by other code. 4. Plus many bugs were fixed, included updates for evolving file formats. Since the Biopython 1.45 release, further work is planned to extend the Population Genetics module (e.g., with a statistics component). A new sequence alignment module is also being implemented with a uniform API for reading and writing various alignment files, based on the approach of the Bio.SeqIO module added last year for working with sequences. Work to improve Biopython's BioSQL support is also ongoing. Time permitting, the talk will also show usage examples of the new population genetics module. The focus will be put not only on the population genetics side, but also on strategies to easily use all available computational power on new multiple core computers. This is useful for users of the most scripting languages as most language interpreter implementations impose stern limits on multi-threaded programming efficiency, which is important when using computational biology code which is CPU intensive. We will take this opportunity to discuss strategies to overcome those language limitations. Any feedback would really be much appreciated, thanks! From biopython at maubp.freeserve.co.uk Thu May 15 09:48:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 May 2008 14:48:26 +0100 Subject: [Biopython-dev] Bio.AlignIO for sequence alignment input/output Message-ID: <320fb6e00805150648y42e91765oa99eab7e5e1cf8fa@mail.gmail.com> Those of you subscribed to the CVS update feed (see http://biopython.org/wiki/Tracking_CVS_commits and the RRS link) will have noticed some activity in Bio.AlignIO which I originally proposed adding a year ago. See also enhancement Bug 2285, http://bugzilla.open-bio.org/show_bug.cgi?id=2285 I've been using this code on and off in my own work, and have put together a reasonable unit test. I've finished a first draft of a new chapter in the tutorial describing the module (you'll need to run pdflatex or hevea on biopython/Doc/Tutorial.tex to read this), and started a wiki page too: http://www.biopython.org/wiki/AlignIO The API is deliberately very close to that of Bio.SeqIO, but deals with Alignment objects rather than SeqRecord objects. I'm hoping for some feedback now, even if it is as little as pointing out any typos in the documentation. Also additional example input files would be good - and checking the Biopython output is understood by third party tools. One particular issue with the API is handling ambiguous FASTA files which have been used to store more than one alignment (discussed in the updated tutorial). There is an optional argument to the Bio.AlignIO.parse() function to specify the number of sequences expected per alignment which covers the most typical scenarios. I am open to the idea of simply removing this option, which means if the user really wants to parse one of the ambigous files, they would have to read in the individual sequences using Bio.SeqIO, batch them as needed, and then create the alignments. Peter From p.j.a.cock at googlemail.com Thu May 15 09:51:59 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 15 May 2008 14:51:59 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> References: <6d941f120805150248j11d0c02cq39626c304c9c7e29@mail.gmail.com> <320fb6e00805150540s50912664r462a194261c5c8c2@mail.gmail.com> <320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com> <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> Message-ID: <320fb6e00805150651md383437w2233bc1419589d40@mail.gmail.com> One little typo I should have spotted earlier: 4. Plus many bugs were fixed, included updates for evolving file formats. Should be: 4. Plus many bugs were fixed, including updates for evolving file formats. Also I didn't insert our addresses for the [1] and [2] implied footnotes. Peter From mjldehoon at yahoo.com Fri May 16 23:04:54 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 16 May 2008 20:04:54 -0700 (PDT) Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> Message-ID: <89450.67823.qm@web62411.mail.re1.yahoo.com> Dear Tiago, Thank you for representing Biopython at BOSC! If there's still time, I would suggest to aim the abstract (and also the talk itself) more at the general audience, who may know very little about Biopython or Python. So perhaps an overview of the main modules (no details, just to give an idea of what is covered by Biopython), the Population Genetics module, number of developers, number of users, and perhaps just mention the existence of some other big packages (numerical python, matplotlib, MMTK, ...) that are relevant to science & biology with Python. The point is that most people in the audience are not Biopython users (yet), so for them a general introduction is more suitable. --Michiel. Tiago Ant???o wrote: Hi all, We are trying to submit an abstract for BOSC 2008 regarding Biopython. Below is the current version. Comments would be very appreciated (we are already after the deadline, so they should come in fast ;) ). Michiel, do you want to add anything to the "future" section? --------------------------------------- Biopython Project Update Tiago Antao[1], Peter Cock[2] In this talk we present the current status of the Biopython project, we focus on features developed since BOSC 2007, future plans for the project and present example usages of the new population genetics module. The latest Biopython release is 1.45 made available on 22 March 2008. Some of the new features are: 1. A new population genetics module including support for coalescent simulation, selection detection and the GenePop file format. The new module relies on existing open source external software (e.g., the open source Simcoal2 for coalescent simulation which is can take advantage of multiple core CPUs for computationally intensive tasks). 2. Improved documentation. 3. Deprecation of many modules which were either obsolete or had been superseded by other code. 4. Plus many bugs were fixed, included updates for evolving file formats. Since the Biopython 1.45 release, further work is planned to extend the Population Genetics module (e.g., with a statistics component). A new sequence alignment module is also being implemented with a uniform API for reading and writing various alignment files, based on the approach of the Bio.SeqIO module added last year for working with sequences. Work to improve Biopython's BioSQL support is also ongoing. Time permitting, the talk will also show usage examples of the new population genetics module. The focus will be put not only on the population genetics side, but also on strategies to easily use all available computational power on new multiple core computers. This is useful for users of the most scripting languages as most language interpreter implementations impose stern limits on multi-threaded programming efficiency, which is important when using computational biology code which is CPU intensive. We will take this opportunity to discuss strategies to overcome those language limitations. Any feedback would really be much appreciated, thanks! _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Sat May 17 02:13:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 17 May 2008 02:13:33 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200805170613.m4H6DXDZ016145@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #914 is|0 |1 obsolete| | ------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp 2008-05-17 02:13 EST ------- Created an attachment (id=920) --> (http://bugzilla.open-bio.org/attachment.cgi?id=920&action=view) Replacement for "Usage ... to load a SeqRecord's taxonomy" Recently I made some changes to the Taxonomy parser in Bio.Entrez, specifically to make it more consistent with the other parsers in Bio.Entrez. Some fields in the XML are now accessed slightly differently. I updated Loader.py accordingly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sun May 18 10:33:25 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 18 May 2008 07:33:25 -0700 (PDT) Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files Message-ID: <157512.3075.qm@web62408.mail.re1.yahoo.com> Hi everybody, In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now installed using a specialized install_data_biopython class. For Bio.Entrez, I am using the package_data argument to the setup function instead. Does anybody know why the install_data_biopython class was used? If there's no specific reason, I'd prefer to use the package_data argument instead. --Michiel. From bugzilla-daemon at portal.open-bio.org Mon May 19 05:30:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 05:30:59 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200805190930.m4J9UxLu016813@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-19 05:30 EST ------- This bug is also linked to Bug 2494 (currently titled "_retrieve_taxon in BioSQL.py needs urgent optimization") which is about not using the left/right values when reteiving data from the database. This is important because changes made in this bug (i.e. Bug 2475) may leave the left/right values NULL when writing new lineages. Also, in repley to comment 31, all of the other _get_...() methods of the Loader class can also add things to the database (e.g. qualifier keys). Once you know this, the fact that _get_taxon_id() goes this too isn't a shock. Also, yes, the _get_taxon_id() function is getting far too long, and should probably be restructured as part of this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Mon May 19 08:09:57 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 19 May 2008 13:09:57 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <89450.67823.qm@web62411.mail.re1.yahoo.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> Message-ID: <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> On Sat, May 17, 2008 at 4:04 AM, Michiel de Hoon wrote: > The point is that most people in the audience are not Biopython users (yet), > so for them a general introduction is more suitable. Actually this issue is of a major concern to me... Do you (or anybody) has a feel of what audience will be there? I think it is important to tune the message to the audience. I actually was speculating that people would know about biopython. But if that is not the case, as you imply, then maybe a something that makes biopython more competitive for people which might be deciding which system (language and libraries) might be the best approach... From p.j.a.cock at googlemail.com Mon May 19 08:26:31 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 19 May 2008 13:26:31 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> Message-ID: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> >> The point is that most people in the audience are not Biopython users (yet), >> so for them a general introduction is more suitable. > > Actually this issue is of a major concern to me... Do you (or anybody) > has a feel of what audience will be there? I think it is important to > tune the message to the audience. I actually was speculating that > people would know about biopython. But if that is not the case, as you > imply, then maybe a something that makes biopython more competitive > for people which might be deciding which system (language and > libraries) might be the best approach... Perhaps I should have given you a broader introduction to BOSC itself. There will probably be talks from BioPerl, BioJava and BioRuby in the same session, and I would expect almost all the audience to be familiar with at least one of these projects. However, they may or may not use python, and I would expect that the majority will not be Biopython users. At least, that was my impression last year at BOSC 2007. Reading over the talk titles/abstracts from last year should give you a feel for the sort of people there presenting work outside the Bio* projects. In terms of general impressions, I felt most of the attendees actually did some hands on coding. So yes, as Michiel says, perhaps the current text isn't general enough. This is a regular opportunity to try raise awareness of the project, although I personally wouldn't give a "hard sell", we should try to give a general overview of Biopython's capabilities. Peter From sbassi at gmail.com Mon May 19 08:36:15 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 19 May 2008 09:36:15 -0300 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> Message-ID: On Mon, May 19, 2008 at 9:26 AM, Peter Cock wrote: .... > project, although I personally wouldn't give a "hard sell", we should > try to give a general overview of Biopython's capabilities. This work may give some ideas about introducing Biopython: http://openwetware.org/wiki/Julius_B._Lucks/Projects/Python_All_A_Scientist_Needs From tiagoantao at gmail.com Mon May 19 08:38:34 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 19 May 2008 13:38:34 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> Message-ID: <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com> In order to address this I am thinking in changing the starting paragraph of the "paper" along the following lines: In this talk we present the current status of the Biopython project. We start by giving a short overview of Biopython - presenting existing functionality - and useful software libraries for computational biology in the Python development 'ecology' (from plotting libraries capable of producing publication quality figures to numerical libraries, among others). We then focus on features developed since BOSC 2007, future plans for the project and present example usages of the new population genetics module. I think changing the abstract along these lines might also be good. I think I will target most of the presentation to the idea that the Python ecology of software development is really good (e.g. putting one slide on matplot lib with code and result, to show how concise and simple code can produce nice results). "Selling" Biopython in the whole python context. On Mon, May 19, 2008 at 1:26 PM, Peter Cock wrote: >>> The point is that most people in the audience are not Biopython users (yet), >>> so for them a general introduction is more suitable. >> >> Actually this issue is of a major concern to me... Do you (or anybody) >> has a feel of what audience will be there? I think it is important to >> tune the message to the audience. I actually was speculating that >> people would know about biopython. But if that is not the case, as you >> imply, then maybe a something that makes biopython more competitive >> for people which might be deciding which system (language and >> libraries) might be the best approach... > > Perhaps I should have given you a broader introduction to BOSC itself. > There will probably be talks from BioPerl, BioJava and BioRuby in the > same session, and I would expect almost all the audience to be > familiar with at least one of these projects. However, they may or > may not use python, and I would expect that the majority will not be > Biopython users. At least, that was my impression last year at BOSC > 2007. Reading over the talk titles/abstracts from last year should > give you a feel for the sort of people there presenting work outside > the Bio* projects. In terms of general impressions, I felt most of > the attendees actually did some hands on coding. > > So yes, as Michiel says, perhaps the current text isn't general > enough. This is a regular opportunity to try raise awareness of the > project, although I personally wouldn't give a "hard sell", we should > try to give a general overview of Biopython's capabilities. > > Peter > -- http://www.tiago.org From tiagoantao at gmail.com Mon May 19 08:49:29 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 19 May 2008 13:49:29 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com> Message-ID: <6d941f120805190549u773310aj5df318952eca5e52@mail.gmail.com> By the way, the suggested abstract proposal: Introduction to and news from the Biopython project presenting both existing modules and current developments including a new Population Genetics module and XML parsers for the NCBI's Entrez web interface. An overview of the existing python software ecology will also be presented in relationship with computational biology. Libraries to do, among others, plotting, numerical analysis and molecular modeling will be presented in connection with Biopython and from the point a view of having a complete platform to do research in computational biology. Biopython is freely available on http://www.biopython.org under a liberal "MIT style" open source license, http://www.biopython.org/DIST/LICENSE On Mon, May 19, 2008 at 1:38 PM, Tiago Ant?o wrote: > In order to address this I am thinking in changing the starting > paragraph of the "paper" along the following lines: > > In this talk we present the current status of the Biopython project. > We start by giving a short overview of Biopython - presenting existing > functionality - and useful software libraries for computational > biology in the Python development 'ecology' (from plotting libraries > capable of producing publication quality figures to numerical > libraries, among others). We then focus on features developed since > BOSC 2007, future plans for the project and present example usages of > the new population genetics module. > > > I think changing the abstract along these lines might also be good. > > I think I will target most of the presentation to the idea that the > Python ecology of software development is really good (e.g. putting > one slide on matplot lib with code and result, to show how concise and > simple code can produce nice results). "Selling" Biopython in the > whole python context. > > On Mon, May 19, 2008 at 1:26 PM, Peter Cock wrote: >>>> The point is that most people in the audience are not Biopython users (yet), >>>> so for them a general introduction is more suitable. >>> >>> Actually this issue is of a major concern to me... Do you (or anybody) >>> has a feel of what audience will be there? I think it is important to >>> tune the message to the audience. I actually was speculating that >>> people would know about biopython. But if that is not the case, as you >>> imply, then maybe a something that makes biopython more competitive >>> for people which might be deciding which system (language and >>> libraries) might be the best approach... >> >> Perhaps I should have given you a broader introduction to BOSC itself. >> There will probably be talks from BioPerl, BioJava and BioRuby in the >> same session, and I would expect almost all the audience to be >> familiar with at least one of these projects. However, they may or >> may not use python, and I would expect that the majority will not be >> Biopython users. At least, that was my impression last year at BOSC >> 2007. Reading over the talk titles/abstracts from last year should >> give you a feel for the sort of people there presenting work outside >> the Bio* projects. In terms of general impressions, I felt most of >> the attendees actually did some hands on coding. >> >> So yes, as Michiel says, perhaps the current text isn't general >> enough. This is a regular opportunity to try raise awareness of the >> project, although I personally wouldn't give a "hard sell", we should >> try to give a general overview of Biopython's capabilities. >> >> Peter >> > > > > -- > http://www.tiago.org > -- http://www.tiago.org From bugzilla-daemon at portal.open-bio.org Mon May 19 09:46:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 09:46:22 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805191346.m4JDkMMf028474@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|blocker |major ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-19 09:46 EST ------- I see from comment 11 that some nasty quote escaping may be needed (which could be an NCBI bug). Have you been able to try using relative paths at the command line (avoiding spaces ideally)? Unfortunately my Windows machine is currently without internet access, which is one reason why I haven't made time to sit down and explore this issue. P.S. I don't think this is a critical bug in Biopython, although I do take your point that it your setup this is a big issue. Downgrading this to severity "major". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 19 17:03:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 17:03:44 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805192103.m4JL3iSk021133@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #13 from drpatnaik at yahoo.com 2008-05-19 17:03 EST ------- To get BioPython call BLAST, this works: 1. my_blast_exe = r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"' Variations like these do not work: 2. "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" 3. r"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" The error being: 'C:/Documents' is not recognized as an internal or external command, operable program or batch file. With my_blast_exe set to the 1st value constant, and trying different my_blast_db values, BLAST reports: [NULL_Caption] ERROR: Arguments must start with '-' (the offending argument #5 was: 'and') /* or 'and\' or 'and\\' */ The values tried for my_blast_db are: 4. 'C:/Documents and Settings/patnaik/My Documents/blast/bin/mine' 5. 'C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine' 6. 'C:/Documents\\ and\ Settings/patnaik/My\\ Documents/blast/bin/mine' 7. "C:/Documents and Settings/patnaik/My Documents/blast/bin/mine" 8. "C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine" 9. "C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine" 10. r'C:/Documents and Settings/patnaik/My Documents/blast/bin/mine' 11. r'C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine' 12. r'C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine' 13. r"C:/Documents and Settings/patnaik/My Documents/blast/bin/mine" 14. r"C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine" 15. r"C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine" But a different error ...: 'C:/Documents' is not recognized as an internal or external command, operable program or batch file. ... is shown with these values: 16. r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/mine"' 17. r'"C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine"' 18. r'"C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine"' That same error is also seen when I try these variations of the value that works in command-line BLAST (comment #10 above): 19. r'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"' 20. r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""' 20. "\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"" 21. r"\"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"\"" 22. r"'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'" Doesn't this suggest that Biopython is not passing the my_blast_db value properly? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 19 17:36:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 17:36:42 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805192136.m4JLag7h022387@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #14 from drpatnaik at yahoo.com 2008-05-19 17:36 EST ------- (continuing comment #13) 23. r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine"' 24. '"C:\\Documents and Settings\\patnaik\\My Documents\\blast\\bin\\mine"' 25. '\\"C:\\Documents and Settings\\patnaik\\My Documents\\blast\\bin\\mine\\"' 26. r"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"" 27. r"'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 19 17:47:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 17:47:00 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805192147.m4JLl0HQ022723@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #15 from drpatnaik at yahoo.com 2008-05-19 17:47 EST ------- (In reply to comment #13) > To get BioPython call BLAST, this works: > 1. my_blast_exe = r'"C:/Documents and Settings/patnaik/My > Documents/blast/bin/blastall.exe"' I forgot to add that I had to comment-out the os.path.exists in NCBIStandaolne.py to get to that step. Equivalently, with this script I get the 'does not exist' message: import os my_blast_exe =r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"' if not os.path.exists(my_blast_exe): print 'cannot find my_blast_exe' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 20 12:31:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 20 May 2008 12:31:41 -0400 Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option In-Reply-To: Message-ID: <200805201631.m4KGVfF8016867@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2496 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-20 12:31 EST ------- Follow up discussion on the mailing list: http://lists.open-bio.org/pipermail/biopython/2008-May/004231.html Katie wrote: > I asked NCBI about this, and they (eventually) replied that it's "not > officially supported." I have been unable to figure out how to get it to > return iterations after the first one. I'm going to close this bug as "invalid" unless the NCBI do make a public API for PSI-BLAST. It looks like the only solution for now would be to install the standalone blast tools... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 20 22:45:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 20 May 2008 22:45:58 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805210245.m4L2jwgM013784@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #16 from drpatnaik at yahoo.com 2008-05-20 22:45 EST ------- Similar to what I mentioned in comment #10 this BLAST command-line code works: (1) "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7 Now I've been trying to see the system call popen3 makes in line 1662 of NCBIStandalone.py by putting this line of code before the os.popen3(" ".join([blastcmd] + params): print " ".join([blastcmd] + params) (as reported in comment #15, I do have to first disable the os.path.exists) Using these values in my test script: my_blast_db =r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""' my_blast_file =r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin"' my_blast_exe =r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"' I get a print command result that is identical to the working BLAST command-line code (1). "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7 But BLAST doesn't get called and the error reported is: 'C:/Documents' is not recognized as ... Finally I tried replacing the code inside the os.popen3 of NCBIStandalone.py with the string (1): w, r, e = os.popen3(r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7') And I get the same error: 'C:/Documents' is not recognized as ... With a non-Biopython-dependent script, I get the same error (irrespective of the quote combinations I tried): import os w, r, e = os.popen3(r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7') print e.read() ------------------------------------------------------------------- FINAL THOUGHTS I think I've to give up on this. There seem to be two incurable issues, unlikely Biopython-specific: os.path.exists (see comment #15) and os.popen3 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 04:34:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 04:34:52 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805210834.m4L8YqVL004607@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-21 04:34 EST ------- The os.path.exists(...) check in Biopython should be easy to fix, probably by the user specifying the exe name without quotes and biopython adding the quotes when building the command line. For specifying the NCBI database locations, have you set the database folder using NCBI.ini yet? I'm not sure if it will work if the INI file is in the BLAST directory as the NCBI documentation says it should go in the Windows directory (which you don't have write access to). Perhaps anywhere on the path will work. See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html There is also the option of using relative paths... You might get more success talking to the machine administrator and asking them to install BLAST for you? The good news is my home internet connection is up and running, so I may be able to do a little investigation on this issue now (time permitting). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed May 21 05:21:51 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 21 May 2008 10:21:51 +0100 Subject: [Biopython-dev] Next release? Message-ID: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> >From the discussion list, quite a few people have suffered from the NCBI tweaking the online Blast XML format with 2.2.18+ (bug 3499), so it would be nice to get a new release out soon to address this. See http://bugzilla.open-bio.org/show_bug.cgi?id=2499 How do the other modules stand at the moment? Bio.PopGen (Tiago). Is this currently stable, or are you in the middle of adding more features? Bio.Entrez (Michiel). I see you've been very busy with the new simplified XML parsers (see bug 2488). This looks like a big improvement on the rather repetitive coding needed in the first draft. Are you still actively making further refinements? How many Entrez XML file formats are still needed? http://bugzilla.open-bio.org/show_bug.cgi?id=2488 Bio.AlignIO - this is new, but has a reasonable amount of documentation and a small unit test (see bug 2285). If we did do a release soon, it could be announced as "in beta", and subject to change, but feedback welcomed. http://bugzilla.open-bio.org/show_bug.cgi?id=2285 In terms of the unit tests, I haven't run them on Windows recently (internet access issues, hopefully resolved now), but on Linux things looks fine. Peter From mjldehoon at yahoo.com Wed May 21 05:40:25 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 21 May 2008 02:40:25 -0700 (PDT) Subject: [Biopython-dev] Next release? In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> Message-ID: <928585.24226.qm@web62401.mail.re1.yahoo.com> Peter wrote:Bio.Entrez (Michiel). I see you've been very busy with the new simplified XML parsers (see bug 2488). This looks like a big improvement on the rather repetitive coding needed in the first draft. Are you still actively making further refinements? How many Entrez XML file formats are still needed? http://bugzilla.open-bio.org/show_bug.cgi?id=2488 I am still making refinements. I am using this module a lot for my own work, and I have a lot of changes that are not in CVS yet. The final result should be much simpler than what is in CVS now. In particular, we won't have to write a Python module for each DTD, but let Python figure out the DTD for itself. Once this is finished (hopefully soon), I'd be happy to make a new release. --Michiel. From bugzilla-daemon at portal.open-bio.org Wed May 21 06:48:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 06:48:40 -0400 Subject: [Biopython-dev] [Bug 2501] New: Minor erratas in module Bio.SeqRecord Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2501 Summary: Minor erratas in module Bio.SeqRecord Product: Biopython Version: Not Applicable Platform: All OS/Version: Linux Status: NEW Severity: trivial Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: xbello at gmail.com line 32: description - Seqeuence description, optional (string) line 63: if self.description : lines.append("Desription: %s" % self.description) Seqeuence instead of Sequence Desription instead of Description -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 07:28:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 07:28:33 -0400 Subject: [Biopython-dev] [Bug 2501] Minor erratas in module Bio.SeqRecord In-Reply-To: Message-ID: <200805211128.m4LBSX99014512@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2501 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-21 07:28 EST ------- Thanks for point those out - fixed in CVS revision 1.16 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Wed May 21 07:41:15 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 21 May 2008 12:41:15 +0100 Subject: [Biopython-dev] Next release? In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> References: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> Message-ID: <6d941f120805210441w4f3fc3d7m42ee5531dca127df@mail.gmail.com> On Wed, May 21, 2008 at 10:21 AM, Peter wrote: > Bio.PopGen (Tiago). Is this currently stable, or are you in the middle > of adding more features? Long story: I will just add after moving to SVN. Actually the most important part is going to be added next, but I am waiting for SVN (any news on that front?). The statistics part that I will be commiting is the core of the module... Short story: Don't worry with me if you are doing a release in the next couple of weeks... From bugzilla-daemon at portal.open-bio.org Wed May 21 07:51:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 07:51:17 -0400 Subject: [Biopython-dev] [Bug 2502] New: PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2502 Summary: PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 Product: Biopython Version: 1.45 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: ibdeno at gmail.com When parsing a PSI-Blast result from blastpgp version 2.2.18 I get this error: Traceback (most recent call last): File "./lpbl.py", line 23, in b_record = b_parser.parse(blast_out) File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 760, in parse self._scanner.feed(handle, self._consumer) File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 98, in feed self._scan_header(uhandle, consumer) File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 208, in _scan_header raise ValueError("Invalid header?") ValueError: Invalid header? The same script and same input just works with blastpgp 2.2.15 I will attach script and input file later. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 07:56:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 07:56:53 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805211156.m4LBurSt016108@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #1 from ibdeno at gmail.com 2008-05-21 07:56 EST ------- Created an attachment (id=921) --> (http://bugzilla.open-bio.org/attachment.cgi?id=921&action=view) Contains a script and an example sequence to reproduce the bug Change in the script the location of the blast command and of the database to be used. Run it as: ./lpbl.py hsTXN.prot.fasta 3 The second argument is the number of iterations for blastpgp -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 09:05:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 09:05:13 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805211305.m4LD5DhV020562@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-21 09:05 EST ------- Miguel - could you also attach the XML output from blastpgp 2.2.15 and 2.2.18 please? e.g. Something like this if you want to do it via Biopython: blast_out, error_info = NCBIStandalone.blastpgp( blastcmd='/opt/Bio/blast-2.2.15/bin/blastpgp', database='/opt/databases/BlastDB/nrdb100ncbi', infile=file, npasses=passes, align_view='0', matrix_outfile=file + '.pssm') handle = open("blastpgp_2.2.15.xml","w") handle.write(blast_out.read()) handle.close() Thanks, Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 10:44:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 10:44:41 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805211444.m4LEifII025392@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #3 from ibdeno at gmail.com 2008-05-21 10:44 EST ------- Created an attachment (id=922) --> (http://bugzilla.open-bio.org/attachment.cgi?id=922&action=view) Plain text and XML outputs from blastgpg The names should be self-explanatory. The log files where produced with the appropriate blastpgp version using the command line: blastpgp -i hsTXN.prot.fasta -d /drives/databases/BlastDB/nrdb100ncbi -j 1 -m [0,7] m = 0 is plain text (as in the original submitted bug) m = 7 is XML -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 13:21:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 13:21:27 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805211721.m4LHLRX1003810@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #18 from drpatnaik at yahoo.com 2008-05-21 13:21 EST ------- The BLAST database folder being inside blast/bin seems to be fine as command-line BLAST does work. I haven't tried relative paths. It should work, as should using an external drive that can provide for space-less paths. But the issue of spaces in paths on Windows remains. I thank you for your suggestions and efforts looking into it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jblanca at btc.upv.es Thu May 22 03:30:52 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Thu, 22 May 2008 09:30:52 +0200 Subject: [Biopython-dev] sequence class proposal Message-ID: <200805220930.53004.jblanca@btc.upv.es> Dear Biopython developers, I've been using python and Biopython for some time now and I would like to talk with you about the sequence classes in Biopython. I have had some issues using the SeqRecord and Alignment classes and I have being discussing and implementing with two students (Victor Sanchez y Pablo Martinez) a proposal of a new sequence class. We would like to present this implementation as a tip in the discussion about the design of the sequence classes in Biopython and we're eager to receive your comments. The first problem that I found with the SeqRecord is the lack of support for qualities. And it is also difficult to implement this quality support in a SeqRecord derived class. There's a problem with the current SeqRecord API that difficults this. Let me explain it. Currently SeqRecord has a seq property and if you want an slice or if you need to reverse or complement you would do something like: my_seq = SeqRecord() my_seq.seq = Seq('ACTG') my_seq.seq[0:2] my_seq.seq = my_seq.reverse() If I derive a class from SeqRecord with a qual property I don't know how to reverse both the sequence and the quality at the same time, because now the Seq methods are called directly without SeqRecord being aware of that. In order to support that we have discuss a new class with a slightly different API and we have done a preliminary implementation. We have named this new class as RichSeq, and we think that this could solve the quality problem. With this new class it would work like this: myseq = RichSeq(seq='ACTG', qual=[50,50,50,50]) subseq = myseq[0:2] myseq.reverse() myseq.complement() RichSeq is equivalent to SeqRecord and it has the same properties as SeqRecord, but it adds the methods __getitem__, reverse, complement and reverse_complement. We have also implemented a new type of features, we have called them RichFeature. They are similar to the SeqFeature. The main difference is that instead of a location and a location operator, they have a BioRange (another new class). This BioRange is inspired/copied from the Bioperl library. The BioRange is optional, so some RichFeature uses would be: RichFeature(id='a_feature', type='annotation', feature='this is an annotation') RichFeature(id='a_feature', type='subsequence', feature=Seq('ACTG')) range = BioRange(start=3,end=6) feat = RichFeature(type='annotation', range=range, feature='some_annotation, e.g. an exon') seq = RichSeq(seq='ACTGACTG', features=[feat]) With this implementation you can define a sequence with seq, qual and annotations associated with a range in a easy way, and after that you can reverse and complement them in a trivial way. range = BioRange(start=3,end=6) feat = RichFeature(type='annotation', range=range, feature='some_annotation') seq = RichSeq(seq='ACTGACTG', qual=[60,60,60,60,60,60,60,60], features=[feat]) seq.reverse() By the way, this is a mutable class, although that could be easily changed. You can even use Seqs and RichSeq as subsequences and ask for slices or complements. range = BioRange(start=1,end=2) feat = RichFeature(type='subsequence', feature=RichSeq(seq='CT'), range=range) seq = RichSeq(seq='ACTG', features=[feat]) seq2 = seq[1:2] seq.reverse() This capability makes this RichSeq an excellent candidate for a base class for an Alignment implementation, but we have not implemented this yet. Attach to this mail you can find the implementation of this new classes. They have some tests that provide an idea about their intended use. We would like to know about your opinions and suggestions. Do you think that this kind of functionality is desirable? Please let us know about any flaw, specially in the API. I think that my work would be easier using a sequence class similar to RichSeq, but maybe there's an easier way. Do you think that is a good idea to attach this classes to bugzilla? Do we open a new bug or there's one for this sequence class debate already open? Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) -------------- next part -------------- A non-text attachment was scrubbed... Name: richseq.0.0.1.tar.gz Type: application/x-tgz Size: 7075 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Thu May 22 11:47:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 22 May 2008 16:47:58 +0100 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <200805220930.53004.jblanca@btc.upv.es> References: <200805220930.53004.jblanca@btc.upv.es> Message-ID: <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> On Thu, May 22, 2008 at 8:30 AM, Jose Blanca wrote: > Dear Biopython developers, > I've been using python and Biopython for some time now and I would like to > talk with you about the sequence classes in Biopython. I have had some issues > using the SeqRecord and Alignment classes and I have being discussing and > implementing with two students (Victor Sanchez y Pablo Martinez) a proposal > of a new sequence class. We would like to present this implementation as a > tip in the discussion about the design of the sequence classes in Biopython > and we're eager to receive your comments. If I understood your terminology correctly, "qualities" is a list of scores, one for each letter in the sequence. I see this is a special case of a more general situation where you have per-letter-annotation information. Examples include secondardy structure or residue coordinates of a protein sequence. Very often for example, secondary structures are stored in files as a simple string whise length matches the length of the sequence. Also related are sub-features like domains or promotor sites which span a range of residues. So I would agree with you that an enhanced class would be useful, where the per letter annotations were respected in splicing, reversing etc. Handling sub-features when slicing is less straight forward. The current SeqRecord and Seq classes separate the sequence annotation from the sequence letters themselves, making this sort of integration difficult. Making the SeqRecord a direct subclass of the Seq object has previously been suggested and would pave the way for this sort of operation. See Bug 2351 where some of these ideas have been floated... http://bugzilla.open-bio.org/show_bug.cgi?id=2351 There are a lot of things that would need to be discussed - for example how would you handle the pre-sequence annotation (e.g. record identifiers) when adding two "rich" seqeunces? I've been content with making small steps for now, with backwards compatibility always in mind. On another note, I'm also thinking about the need for an annotated sequence alignment object, where there are similar concerns. Also, have you discussed the alphabet objects? > Do you think that is a good idea to attach this classes to bugzilla? Do we > open a new bug or there's one for this sequence class debate already open? Your proposals do seem very broad, so have a look at Bug 2351 first, but perhaps start a new enhancement bug, and then attach the code. Peter From bugzilla-daemon at portal.open-bio.org Fri May 23 06:06:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 06:06:44 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231006.m4NA6itj022486@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 06:06 EST ------- I've worked out that the original problem was use to trying to parse XML output with the Bio.Blast.NCBIStandalone.PSIBlastParser (which expects the plain text output only). Perhaps the error message could be more helpful in this situation? I'm using Biopython from CVS, but it seems to parse the plain text output from both 2.2.15 and 2.2.18 fine. Here is a modified version of your code which reads from the example plain text files provided: #!/usr/bin/env python # import os, re, string, operator from Bio.Blast import NCBIStandalone from sys import * E_VALUE_THRESH = 0.005 nolf = re.compile('\n') nogaps = re.compile('-') blast_out = open("blastpgp.2.2.18.txt") b_parser = NCBIStandalone.PSIBlastParser() b_record = b_parser.parse(blast_out) if b_record.converged == 1: print '*** Converged!!! ***' fastaout = open('test_psiblast.fasta','w') summout = open('test_psiblast.txt','w') for alignment in b_record.rounds[-1].alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: ident = 100.0*hsp.identities[0]/hsp.identities[1] simil = 100.0*hsp.positives[0]/hsp.positives[1] mytitle = nolf.sub(' ',alignment.title) mysbjct = nogaps.sub('',hsp.sbjct) summout.write('****Alignment****\n') summout.write('sequence: %s\n' % mytitle[0:70]) summout.write('e value: %e\n' % hsp.expect) summout.write('alignment length: %i\n' % hsp.positives[1]) summout.write('identity: %(ident)5.2f\n' % {'ident': ident} ) summout.write('similarity: %(simil)5.2f\n' % {'simil': simil} ) summout.write('query: from %i to %i\n' % (hsp.query_start, hsp.query_end)) summout.write('subject: from %i to %i\n' % (hsp.sbjct_start, hsp.sbjct_end)) summout.write('%s ...\n' % hsp.query[0:75]) summout.write('%s ...\n' % hsp.match[0:75]) summout.write('%s ...\n' % hsp.sbjct[0:75]) fastaout.write('%s\n%s\n' % (mytitle,mysbjct)) summout.close() fastaout.close() print "Done" ---------------------------------------------------------------------- So, as far as I can tell, the plain text PSI Blast parser is fine . As I do not have the relevant databases installed, I have not tried using Biopython to call blastpgp to run PSI-Blast. It could be there is a problem here with specifying the output format... As to the XML output, you can sort of parse this with Bio.Blast.NCBIXML and I think you get back an iterator yielding a record for each iteration. However, as the example you provided had only one query and one iteration, this should be tested further. The record is not showing all the information extracted by the PSI-Blast text parse, which should be in the XML file. Perhaps you would like to investigate this? Example code: from Bio.Blast import NCBIStandalone, NCBIXML for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] : print print filename print "="*len(filename) handle = open(filename) record = NCBIStandalone.PSIBlastParser().parse(handle) print record.query if record.converged : print '*** Converged!!! ***' for iter_round in record.rounds : print "Iteration with %i alignments" \ % (len(iter_round.alignments)) print "%i new sequences, %i reused" \ %(len(iter_round.new_seqs), len(iter_round.reused_seqs)) print "End of plain text output" for filename in ["blastpgp.2.2.15.xml", "blastpgp.2.2.18.xml"] : print print filename print "="*len(filename) handle = open(filename) for iter_round in NCBIXML.parse(handle) : print iter_round.query print "Iteration with %i alignments" \ % (len(iter_round.alignments)) print "End of XML output" The output: blastpgp.2.2.15.txt =================== gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] Iteration with 250 alignments 500 new sequences, 0 reused End of plain text output blastpgp.2.2.18.txt =================== gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] Iteration with 250 alignments 500 new sequences, 0 reused End of plain text output blastpgp.2.2.15.xml =================== gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] Iteration with 500 alignments End of XML output blastpgp.2.2.18.xml =================== gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] Iteration with 250 alignments End of XML output Notice that NCBI must have changed the XML format in some way (500 versus 250 alignments between versions 2.2.15 and 2.2.18). I have not explored this in any detail. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 06:45:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 06:45:58 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231045.m4NAjwr4023917@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #5 from ibdeno at gmail.com 2008-05-23 06:45 EST ------- Hi Peter, Thank you. The problem must be then with the blastpgp call from Biopython, since my code was trying to obtain plain text via the align_view='0' option: blast_out, error_info = NCBIStandalone.blastpgp( blastcmd='/home/mortiz/Progs/blast-2.2.15/bin/blastpgp', database='/drives/databases/BlastDB/nrdb100ncbi', infile=file, npasses=passes, align_view='0', matrix_outfile=file + '.nrdb100ncbi.pssm') However, when I print the result of this call with the handler you proposed: handle = open("blastpgp_2.2.18.txt","w") handle.write(blast_out.read()) handle.close() I actually get plain text! The same blastpgp call (same binary, same database, same input file sequence, same number of PSI-Blast iterations) still gives the error reported in the bug with version 2.2.18, but works all right with 2.2.15. Because the error appears within seconds, I'm wondering if the parser is not trying to read the results before blastpgp has actually finished the iterations (about 3 minutes in my test) I'm without a clue... Miguel (In reply to comment #4) > I've worked out that the original problem was use to trying to parse XML output > with the Bio.Blast.NCBIStandalone.PSIBlastParser (which expects the plain text > output only). Perhaps the error message could be more helpful in this > situation? > > I'm using Biopython from CVS, but it seems to parse the plain text output from > both 2.2.15 and 2.2.18 fine. Here is a modified version of your code which > reads from the example plain text files provided: > > #!/usr/bin/env python > # > import os, re, string, operator > from Bio.Blast import NCBIStandalone > from sys import * > > E_VALUE_THRESH = 0.005 > > nolf = re.compile('\n') > nogaps = re.compile('-') > > blast_out = open("blastpgp.2.2.18.txt") > b_parser = NCBIStandalone.PSIBlastParser() > b_record = b_parser.parse(blast_out) > > if b_record.converged == 1: > print '*** Converged!!! ***' > > fastaout = open('test_psiblast.fasta','w') > summout = open('test_psiblast.txt','w') > > for alignment in b_record.rounds[-1].alignments: > for hsp in alignment.hsps: > if hsp.expect < E_VALUE_THRESH: > ident = 100.0*hsp.identities[0]/hsp.identities[1] > simil = 100.0*hsp.positives[0]/hsp.positives[1] > mytitle = nolf.sub(' ',alignment.title) > mysbjct = nogaps.sub('',hsp.sbjct) > summout.write('****Alignment****\n') > summout.write('sequence: %s\n' % mytitle[0:70]) > summout.write('e value: %e\n' % hsp.expect) > summout.write('alignment length: %i\n' % hsp.positives[1]) > summout.write('identity: %(ident)5.2f\n' % {'ident': ident} ) > summout.write('similarity: %(simil)5.2f\n' % {'simil': simil} ) > summout.write('query: from %i to %i\n' % (hsp.query_start, > hsp.query_end)) > summout.write('subject: from %i to %i\n' % (hsp.sbjct_start, > hsp.sbjct_end)) > summout.write('%s ...\n' % hsp.query[0:75]) > summout.write('%s ...\n' % hsp.match[0:75]) > summout.write('%s ...\n' % hsp.sbjct[0:75]) > fastaout.write('%s\n%s\n' % (mytitle,mysbjct)) > > summout.close() > fastaout.close() > print "Done" > > ---------------------------------------------------------------------- > > So, as far as I can tell, the plain text PSI Blast parser is fine . > > As I do not have the relevant databases installed, I have not tried using > Biopython to call blastpgp to run PSI-Blast. It could be there is a problem > here with specifying the output format... > > As to the XML output, you can sort of parse this with Bio.Blast.NCBIXML and I > think you get back an iterator yielding a record for each iteration. However, > as the example you provided had only one query and one iteration, this should > be tested further. The record is not showing all the information extracted by > the PSI-Blast text parse, which should be in the XML file. Perhaps you would > like to investigate this? > > Example code: > > from Bio.Blast import NCBIStandalone, NCBIXML > > for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] : > print > print filename > print "="*len(filename) > handle = open(filename) > record = NCBIStandalone.PSIBlastParser().parse(handle) > print record.query > if record.converged : print '*** Converged!!! ***' > for iter_round in record.rounds : > print "Iteration with %i alignments" \ > % (len(iter_round.alignments)) > print "%i new sequences, %i reused" \ > %(len(iter_round.new_seqs), len(iter_round.reused_seqs)) > print "End of plain text output" > > for filename in ["blastpgp.2.2.15.xml", "blastpgp.2.2.18.xml"] : > print > print filename > print "="*len(filename) > handle = open(filename) > for iter_round in NCBIXML.parse(handle) : > print iter_round.query > print "Iteration with %i alignments" \ > % (len(iter_round.alignments)) > print "End of XML output" > > The output: > > blastpgp.2.2.15.txt > =================== > gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] > Iteration with 250 alignments > 500 new sequences, 0 reused > End of plain text output > > blastpgp.2.2.18.txt > =================== > gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] > Iteration with 250 alignments > 500 new sequences, 0 reused > End of plain text output > > blastpgp.2.2.15.xml > =================== > gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] > Iteration with 500 alignments > End of XML output > > blastpgp.2.2.18.xml > =================== > gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] > Iteration with 250 alignments > End of XML output > > Notice that NCBI must have changed the XML format in some way (500 versus 250 > alignments between versions 2.2.15 and 2.2.18). I have not explored this in > any detail. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 07:02:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 07:02:44 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231102.m4NB2iPS024763@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 07:02 EST ------- That's an interesting theory - reading directly from standard out is causing the problem (comment 5). One thing you could try is writing the blastpgp output to a file, and then opening the file for reading. I'm not sure if blastpgp has a file output option. You could just try this: blast_out, error_info = NCBIStandalone.blastpgp(...) handle = open("blastpgp_2.2.18.txt","w") handle.write(blast_out.read()) handle.close() blast_out = open("blastpgp_2.2.18.txt") b_parser = NCBIStandalone.PSIBlastParser() b_record = b_parser.parse(blast_out) ... Or, for a very crude workaround: from time import sleep blast_out, error_info = NCBIStandalone.blastpgp(...) sleep(5*60) #Five minutes b_parser = NCBIStandalone.PSIBlastParser() b_record = b_parser.parse(blast_out) ... If those work, it would be good evidence that your theory is right. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jblanca at btc.upv.es Fri May 23 07:10:13 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 23 May 2008 13:10:13 +0200 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> References: <200805220930.53004.jblanca@btc.upv.es> <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> Message-ID: <200805231310.13408.jblanca@btc.upv.es> Hi: After reding the suggestions in Bug 2351 I've coded a MutableSeq class that inherits from UserString.MutableString instead of using an array stored in self.data. It's quite easily to do it work as the MutableSeq present in Biopytyhon 1.45, but there's some problems to solve. I don't know if this class would be faster or easier to maintain than the MutableSeq that uses array.array. I've just done that as an experiment to learn something about Biopython. Now the compatibility problems that I have found... self.data is not an array but an str. That's not easy to solve becase MutableString uses self.data internaly. I tried to define a property class, but MutableString is an old style class. Maybe I don't know enough python, but I don't know how to solve this type mismatch. append() and extend() could be coded using __add__(). insert() and remove() are not supported by MutableSeq and would have to be coded. But I don't see the point of this methods in a sequence class. I think that the Seq and the MutableSeq API should be as similar as possible and since Seq uses __add__() I don't understand why MutableSeq should use append() and extend(). I also have problems with del seq[2:4:-1] and seq[2::3] = "N" * len(seq[2::3]) All the other tests for MutableSeq just work. -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From bugzilla-daemon at portal.open-bio.org Fri May 23 08:38:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 08:38:28 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231238.m4NCcS0S028452@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #7 from ibdeno at gmail.com 2008-05-23 08:38 EST ------- Unfortunately the hypothesis was not correct. If I create an intermediate file, the parser works well if the file comes from blastpgp 2.2.15 but chokes on 2.2.18. There is a new reference in 2.2.18 header: Reference for compositional score matrix adjustment: Altschul, Stephen F., John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. which falls between the two ones existing in the 2.2.15 version and makes the header longer in terms of number of lines... Might be this? Miguel (In reply to comment #6) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 10:30:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 10:30:42 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231430.m4NEUgVL001388@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 10:30 EST ------- I'm using the CVS version of Biopython under Linux. The file main NCBIStandalone.py hasn't changed since Biopython 1.45, although Record.py has. I am a little puzzled about why I can parse both the 2.2.15 and the 2.2.18 plain text examples you provided without problems, but something fails for you. Could you double check what happens on your machine using these two example files from attachment 922 comment 3, and this code I gave in comment 4: from Bio.Blast import NCBIStandalone for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] : print print filename print "="*len(filename) handle = open(filename) record = NCBIStandalone.PSIBlastParser().parse(handle) print record.query if record.converged : print '*** Converged!!! ***' for iter_round in record.rounds : print "Iteration with %i alignments" \ % (len(iter_round.alignments)) print "%i new sequences, %i reused" \ %(len(iter_round.new_seqs), len(iter_round.reused_seqs)) print "End of plain text output" If this doesn't work, please give the full stack trace - "chokes" is a little vague. Looking at the example files you provided in attachment 922 comment 3, they seem to have replaced one reference with another. This is the start of the diff output comparing the two files: 1c1 < BLASTP 2.2.15 [Oct-15-2006] --- > BLASTP 2.2.18 [Mar-02-2008] 10,15c10,13 < Reference for composition-based statistics: < Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, < Sergei Shavirin, John L. Spouge, Yuri I. Wolf, < Eugene V. Koonin, and Stephen F. Altschul (2001), < "Improving the accuracy of PSI-BLAST protein database searches with < composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. --- > Reference for compositional score matrix adjustment: Altschul, Stephen F., > John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, > Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches > using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. This reference change doesn't seem to cause a problem on my machine. I didn't notice anything else worth commenting about. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 11:02:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:02:10 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231502.m4NF2AZm003440@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #9 from ibdeno at gmail.com 2008-05-23 11:02 EST ------- Hi Peter, Thank you for your patience and sorry not to be clear. 1. By 'choke' I meant that it produced the same error mentioned in the original but report. 2. I see now that my attachments (#922) were not appropriate: to gain some time I had requested no iterations to blastpgp, that is: I used '-j 1'. I can actually parse the plain text from 2.2.18 that I had submitted in those attachments both with your and my code. This also explains the differences in the headers... I will now submit two plain text outputs from blastpgp with 2 iterations ('-j 3') Your code and mine can parse 2.2.15 but both fail (with the "Incorrect header ?" error) with 2.2.18 Sorry again... Miguel (In reply to comment #8) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 11:05:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:05:16 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231505.m4NF5G1k003638@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #10 from ibdeno at gmail.com 2008-05-23 11:05 EST ------- Created an attachment (id=923) --> (http://bugzilla.open-bio.org/attachment.cgi?id=923&action=view) Plain text outputs from blastpgp versions 2.2.15 and 2.2.18 with 2 iterations These files are the result of calling blastpgp with the -j 3 option. The files sent with attachment #922 were actually no problematic, only when at least one iteration is carried out the parsing problem appears with blastpgp version 2.2.18. Perhaps due to the insertion of a new Reference in the header of the blastpgp output? Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 11:16:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:16:14 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231516.m4NFGExh004121@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 11:16 EST ------- Great - I now get the same error as you :) I'll try and have a look at this over the weekend. Would you be able to make matching XML files as well? While I'm playing with blastpgp output it would be worth checking exactly what the XML files do... P.S. Would you object to me using any of your examples as test cases for the Biopython unit tests? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 11:25:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:25:19 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231525.m4NFPJVY004581@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 11:25 EST ------- You are right - it is the extra reference which was causing the failure. I've checked in a fix to Bio/Blast/NCBIStandalone.py with CVS revision 1.72 Could you update your Biopython installation to CVS and retest? Or just replace /home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py with revision 1.72 from the ViewCVS website once its updated: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython (I haven't closed this bug yet - I'd like your confirmation that this fixes things, adding a new test case would probably be wise.) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 11:39:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:39:49 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231539.m4NFdn83005197@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #13 from ibdeno at gmail.com 2008-05-23 11:39 EST ------- Created an attachment (id=924) --> (http://bugzilla.open-bio.org/attachment.cgi?id=924&action=view) XML equivalent of the files in the previous attachment (#923) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 11:41:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:41:17 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231541.m4NFfHv7005278@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #14 from ibdeno at gmail.com 2008-05-23 11:41 EST ------- I have now submitted the XML equivalent files. Sure, please use the examples and code if you find them useful. Cheers, Miguel (In reply to comment #11) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 11:42:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:42:50 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231542.m4NFgolb005350@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #15 from ibdeno at gmail.com 2008-05-23 11:42 EST ------- I will try as soon as revision 1.72 is available through the link you provided. So far, the latest is 1.71 Thank you! Miguel (In reply to comment #12) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 11:56:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:56:13 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231556.m4NFuDpd005873@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #16 from ibdeno at gmail.com 2008-05-23 11:56 EST ------- Sorry, I won't be able to try your fix until next week: I don't have access to the computer due to maintenance. I'll let you know as soon as possible. Miguel (In reply to comment #15) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat May 24 03:16:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 24 May 2008 03:16:44 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805240716.m4O7GiqV007275@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #17 from ibdeno at gmail.com 2008-05-24 03:16 EST ------- I have managed to access to a different computer and tested your revised (1.72) version of NCBIStandalone.py I'm glad I can confirm it does work. I guess the best way to avoid such problems in future would be to have an appropriate XML parser for PSI-Blast. Thank you very much for your assistance. (In reply to comment #12) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From peter at maubp.freeserve.co.uk Sat May 24 07:02:51 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Sat, 24 May 2008 12:02:51 +0100 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <200805231310.13408.jblanca@btc.upv.es> References: <200805220930.53004.jblanca@btc.upv.es> <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> <200805231310.13408.jblanca@btc.upv.es> Message-ID: <320fb6e00805240402t68094be2v4cbad1414f3e21b9@mail.gmail.com> Hi Jose, Your ideas are interesting for switching the MutableSeq class from an array of char internally to a python mutable string. However, are you talking about the UserString.MutableString object? The documentation suggests its not going to be as fast as a list or a character array: http://pydoc.org/2.5.1/UserString.html#MutableString Note that at some point we will be moving from Numeric to numpy, so the exact internals of the current array based MutableSeq will change slightly then. I will be away most of next week, so don't worry if I seem to be ignoring you ;) Peter From bugzilla-daemon at portal.open-bio.org Sat May 24 08:10:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 24 May 2008 08:10:24 -0400 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200805241210.m4OCAOol018283@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-24 08:10 EST ------- See also http://www.bioperl.org/wiki/Qual_sequence_format where there is a similar looking file format which they call "qual" described as also being used by PHRAP and CAP3. e.g. >HSMETOO 134bp 10 20 30 40 50 50 50 50 50 20 25 25 30 30 20 15 20 35 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 20 30 20 10 10 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat May 24 08:15:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 24 May 2008 08:15:23 -0400 Subject: [Biopython-dev] [Bug 2503] New: An error when parsing NCBIWWW Blast output Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2503 Summary: An error when parsing NCBIWWW Blast output Product: Biopython Version: Not Applicable Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: hebbar.prashanth at gmail.com Hi All, I get following error when I start parsing NCBIWWW balst output. Traceback (most recent call last): File "", line 1, in -toplevel- b_record = b_parser.parse(blast_results) File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 43, in parse self._scanner.feed(handle, self._consumer) File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 94, in feed has_re=re.compile(r'.?BLAST')) File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 335, in read_and_call_until line = safe_readline(uhandle) File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 411, in safe_readline raise SyntaxError, "Unexpected end of stream." SyntaxError: Unexpected end of stream. Can any one please help me to solve this? I am using biopython 1.44 version (I tried with 1.45 too, the same error comes) in windows system Thank you in anticipation, Prashanth -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat May 24 08:25:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 24 May 2008 08:25:59 -0400 Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast output In-Reply-To: Message-ID: <200805241225.m4OCPxTc018893@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2503 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-24 08:25 EST ------- We need more information. Could you show us the example code that causes this problem? If you are trying to parse a file (e.g. from standalone blast), could attach it to this bug? For the look of the stack trace, you are trying to parse the HTML output from blast (?). We do recommend parsing the XML output if possible (not the plain text or HTML output). Thank you, Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat May 24 10:26:27 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 24 May 2008 07:26:27 -0700 (PDT) Subject: [Biopython-dev] Bio.Entrez & Bio.EUtils In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> Message-ID: <893127.27535.qm@web62412.mail.re1.yahoo.com> Dear all, I have essentially completed the parser in Bio.Entrez. AFAICT, it works with all kinds of XML files returned by NCBI's Entrez Utilities, except for the Pubmed Central database (Pubmed itself is fine). I am using this module a lot for my own work, so it has received quite a lot of testing. As a case in point, there are 40 unit tests for the Bio.Entrez parser. These, by the way, can show you some examples of how to use this module. The documentation is now also updated. This module may at some point replace Bio.EUtils, so if you are using this module you might want to try Bio.Entrez to see if it covers everything Bio.EUtils covers. --Michiel Peter wrote:Bio.Entrez (Michiel). I see you've been very busy with the new simplified XML parsers (see bug 2488). This looks like a big improvement on the rather repetitive coding needed in the first draft. Are you still actively making further refinements? How many Entrez XML file formats are still needed? http://bugzilla.open-bio.org/show_bug.cgi?id=2488 From mjldehoon at yahoo.com Sat May 24 10:16:15 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 24 May 2008 07:16:15 -0700 (PDT) Subject: [Biopython-dev] sequence class proposal In-Reply-To: <320fb6e00805240402t68094be2v4cbad1414f3e21b9@mail.gmail.com> Message-ID: <135625.21242.qm@web62412.mail.re1.yahoo.com> Peter wrote: > Note that at some point we will be moving from Numeric to numpy, so > the exact internals of the current array based MutableSeq will change > slightly then. MutableSeq uses Python's array, not Numeric's array, so it should not be affected by moving from Numeric to numpy. --Michiel. From biopython at maubp.freeserve.co.uk Sun May 25 06:36:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 25 May 2008 11:36:14 +0100 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <1211479809.4835b70111c71@webmail.upv.es> References: <200805220930.53004.jblanca@btc.upv.es> <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> <1211479809.4835b70111c71@webmail.upv.es> Message-ID: <320fb6e00805250336u251dd2buae72397aa10374b0@mail.gmail.com> On May 22, 2008, Blanca Postigo Jose Miguel wrote: >> If I understood your terminology correctly, "qualities" is a list of >> scores, one for each letter in the sequence. > You're right. I'm sorry, I used them a lot and a reserved them a special place > in the API, my fault, I will remove it, only the sequence should have a > relevant place in the API, the rest should be stored as features. I've asked on the BioSQL mailing list about this sort of "per letter" annotation. Currently there is no mechanism to store this sort of thing in the schema. http://lists.open-bio.org/pipermail/biosql-l/2008-May/001269.html However, Hilmar did point out some relevant bits of BioPerl to have a look at: Hilmar Lapp wrote: > In BioPerl we have Bio::Seq::SeqWithQuality and the more generic > Bio::Seq::MetaI. The BioPerl SeqWithQuality sounds like what you were most interested in Jose, although the Meta-Interface may be of relevance too. Peter From biopython at maubp.freeserve.co.uk Sun May 25 08:06:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 25 May 2008 13:06:50 +0100 Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files In-Reply-To: <157512.3075.qm@web62408.mail.re1.yahoo.com> References: <157512.3075.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com> On Sun, May 18, 2008, Michiel de Hoon wrote: > Hi everybody, > > In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now > installed using a specialized install_data_biopython class. For Bio.Entrez, I am > using the package_data argument to the setup function instead. Does anybody > know why the install_data_biopython class was used? If there's no specific > reason, I'd prefer to use the package_data argument instead. I think I've found one reason not to - it doesn't seem to be supported in Python 2.3 as shown here: C:\TEMP\biopython_cvs\biopython_all\biopython>c:\Python23\python.exe setup.py install c:\Python23\lib\distutils\dist.py:227: UserWarning: Unknown distribution option: 'package_data' warnings.warn(msg) running install ... If I'd known this earlier, I would of course have said something. On the other hand, I may be the only person still using Biopython with python 2.3. Peter From tiagoantao at gmail.com Sun May 25 08:48:35 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 25 May 2008 13:48:35 +0100 Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files In-Reply-To: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com> References: <157512.3075.qm@web62408.mail.re1.yahoo.com> <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com> Message-ID: <6d941f120805250548t357d6d0fwe36d5d1b39eaaa77@mail.gmail.com> > If I'd known this earlier, I would of course have said something. On > the other hand, I may be the only person still using Biopython with > python 2.3. What about doing a survey (or a web poll on the site) on the main list to know what python versions people are using? To have a sense of what should be supported/deprecated... From biopython at maubp.freeserve.co.uk Sun May 25 06:36:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 25 May 2008 11:36:14 +0100 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <1211479809.4835b70111c71@webmail.upv.es> References: <200805220930.53004.jblanca@btc.upv.es> <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> <1211479809.4835b70111c71@webmail.upv.es> Message-ID: <320fb6e00805250336u251dd2buae72397aa10374b0@mail.gmail.com> On May 22, 2008, Blanca Postigo Jose Miguel wrote: >> If I understood your terminology correctly, "qualities" is a list of >> scores, one for each letter in the sequence. > You're right. I'm sorry, I used them a lot and a reserved them a special place > in the API, my fault, I will remove it, only the sequence should have a > relevant place in the API, the rest should be stored as features. I've asked on the BioSQL mailing list about this sort of "per letter" annotation. Currently there is no mechanism to store this sort of thing in the schema. http://lists.open-bio.org/pipermail/biosql-l/2008-May/001269.html However, Hilmar did point out some relevant bits of BioPerl to have a look at: Hilmar Lapp wrote: > In BioPerl we have Bio::Seq::SeqWithQuality and the more generic > Bio::Seq::MetaI. The BioPerl SeqWithQuality sounds like what you were most interested in Jose, although the Meta-Interface may be of relevance too. Peter From jblanca at btc.upv.es Mon May 26 01:24:30 2008 From: jblanca at btc.upv.es (Blanca Postigo Jose Miguel) Date: Mon, 26 May 2008 07:24:30 +0200 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com> References: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com> Message-ID: <1211779470.483a498e18e3e@webmail.upv.es> > One of your points seemed to be that the SeqRecord couldn't have a > __getitem__ and methods like reverse, complement, etc. I don't see > why it couldn't have these. Perhaps rather than introducing a whole > new class, enhancing the SeqRecord would be a better avenue. My main concern with SeqRecord is that is has a Seq, it we want a slice or a reverse we would do: my_seq = SeqRecord(Seq('ACTGTGAC')) myseq.seq[1:5] myseq.seq.reverse() If we add to SeqRecord residues annotations (like qualities) how could be reversed if we are calling directly to the .seq.reverse(). I don't know how could this work. my_seq = SeqRecord(Seq('ACTG'), Qual([10,20,30,40])) myseq.seq.reverse() It would create a non-valid sequence str(myseq.seq) -> 'GTCA' str(myseq.qual) -> [10,20,30,40] One possibility is to have methods like __getitem__ and in Seq, it would be like: my_seq.seq[1:3] my_seq[1:3] Just for testing I have done a RichSeq that is compatible with Seq and SeqRecord, but that's very confusing. Does this SeqRecord HAS or IS a sequence? It could work, but I feel that is wrong and it is easier to explain to the users that a new improved SeqRecord has been created (RichSeq) and that they should migrate to that. Another problem difficult to solve. If RichSeq is compatible with Seq as Michel wants to and I agree on that, how it could be compatible with SeqRecord. The parameters in their constructors are not compatible: SeqRecord(seq, ...) Seq(data, alphabet...) I would happily improve on RichSeq, but I don't know how to do it in a sane way. What do you think? > > Also, I do think we should bear in mind the BioSQL sequence > representation, which we currently expose in a SeqRecord/Seq like way. > I wouldn't want to lose this / have to completely re-write the > Biopython BioSQL code. I would look into that. Best regards, Jose Blanca > > Peter > > On Sun, May 25, 2008 at 9:12 PM, Blanca Postigo Jose Miguel > wrote: > > Dear biopythonistas: > > First of all my apologize for the MutableSeq reimplementation. I did it > just for > > the sake of learning more about python and Biopython, not to achive a > speedier > > implementation. It has been a good learning exercise for me, but now let's > go > > for the meat... > > > > Everything that follows is just my opinion on the sequence classes. Mine is > not > > a well informed opinion and I would just like to show my ideas to you to > get > > some feed back and to learn from you. > > > > Since this sequence class remodelation is a complex topic I would like to > > explain my ideas about it with some order. I won't enter into > implementation > > details, I will just discuss the API of the classes. > > I think that Seq and MutableSeq are pretty ok, although MutableSeq has some > > extra method that depends on implementation and are not relevant for a > sequence > > class (append, insert, pop, remove). In general Seq and MutableSeq should > have > > the same API, that would do their use simpler. > > > > I think that the main problem is SeqRecord. SeqRecord IS NOT a sequence it > HAS a > > sequence, that's its main flaw. A more capable Seq class should be a Seq. > My > > proposal is to create a RichSeq that inherits from Seq and a MutableRichSeq > > that inherits from MutableSeq. I've been doing some coding and some > thinking > > about that. I'm discussing this with you, because I would like to improve > the > > desing of the API of such sequence and I could implement it. It's main > desing > > guidelines would be: > > - Compatible with Seq or with MutableSeq. Everytime that you can use a Seq > class > > you can also use a more capable RichSeq without changing anything in your > > program. > > - RichSeq IS a Seq, it inherits from Seq. > > - RichSeq is similar to SeqRecord, but they aren't compatible. > > The SeqRecord constructor is: > > def __init__(self, seq, id = "", name = "", > > description = "", dbxrefs = None, > > features = None): > > and the RichSeq one maybe: > > def __init__(self, seq=None, alphabet = None, > > id = "", name = "", > > description = "", dbxrefs = None, > > features = None): > > RichSeq has a seq(or could be data) and an alphabet (like the Seq > class) while > > SeqRecord has a Seq object. > > RichSeq would not have a .seq property. > > - RichSeq has a __getitem__ method capable of things like RichSeq[1:2]. And > it > > would also had the methods reverse, complement, etc.. That's not possible > with > > SeqRecord. > > - RichSeq should be a new type class, what about Seq and MutableSeq? > > - From a Michel's comment: > > 1) A Seq object is basically a string, so it should behave as if it > were > > subclassed from string. > > 2) As a result, functions that have a sequence as an argument, but > don't need > > the added features of a Seq object, should work with strings as well > as Seq > > objects. > > 4) Currently, Seq objects have an associated alphabet; SeqRecord > objects have > > annotations, dbxrefs, a description, features, id, and name. I think > a new Seq > > object should have both, so that we can avoid having both a Seq and > a SeqRecord > > class. Of course, some or all of these fields can remain None. (I > would add, > > that even the seq could be None) > > If biopython had a class like RichSeq I wouldn't use SeqRecord. Also, the > > transition from using SeqRecord to RichSeq would be very easy and both > classes > > could coexist as long as you would like. > > Also using the features the per-residue annotation is very easy to > implement. In > > fact I have done it already using a RichFeature class, but I would discuss > that > > in other mail. > > RichSeq is more easy to extend than SeqRecord, that's its main advantage. I > have > > pretty wild plans for a class like RichSeq. A class like SeqWithQuality or > the > > Bio::Seq::MetaI from Bioperl would be very easy to derive from RichSeq. The > > would be just easier interfaces to the more capable and general RichSeq. > Even > > Alignment would be derived from RichSeq. An Alignment IS a sequence with > > subsequences in it. I have also implemented a prototype of that and it work > > quite ok with very like coding. > > This are the more general remarks about RichSeq. What do you think? Is a > good > > idea to go beyond SeqRecord for biopython? Could be something like RichSeq > a > > possible way to do it? > > > > Now I would like to list the open discussion points regarding the sequence > class > > APIs. > > - annotations is not in the constructor of SeqRecord. There's two options: > add > > it to the RichSeq constructor or remove it altogether. In my implementation > a > > feature can span the whole sequence length or can have a range attached. In > > this way annotations are just a special case of featues. We would have to > > decide between dict and list for the API. > > > > - __getitem__ should always return a RichSeq. It's more consistent to > return the > > same for a_seq[1:2] and a_seq[1]. If someone wants a character can do > > str(seq)[1]. > > > > - no seq property in RichSeq. > > > > - with __str__ is enough, so tostring() is not necessary for more complex > > representations we have __repr__. tostring()could be kept for compatibility > > with the Seq and MutableSeq API. > > > > - What to do with id, name and the str annotations when a slice is > requested? If > > seq.name is 'a_sequence' should seq[1:10].name be 'a_sequence' or > 'a_sequence > > [1:10]' or ''? Same problem with add and __radd__.This is a problem, but > some > > of the three alternatives should be taken and explained in the > documetation. A > > better solution is in my RichFeature class, but I wouldn't discuss it now. > > > > - __iter__ iterates over the sequence as a character string. > > > > - __add__ and __radd__ > > > > - .upper(), .count(), .lower() > > > > - .data property. I think that this is an implemetation detail and it > should be > > deprecated from Seq and MutableSeq. > > > > Well, that's all sorry for the long mail. I'm enjoing working on this > problem > > and learning from you. > > Best regards, > > > > Jose Blanca > > > > > -- From bugzilla-daemon at portal.open-bio.org Wed May 28 08:17:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 May 2008 08:17:25 -0400 Subject: [Biopython-dev] [Bug 2506] New: SELECT problems on _get_seqfeature_dbxref in Loader.py with postgresql Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2506 Summary: SELECT problems on _get_seqfeature_dbxref in Loader.py with postgresql Product: Biopython Version: 1.45 Platform: PC OS/Version: Linux Status: NEW Severity: blocker Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: andrea at biodec.com CC: andrea at biodec.com Using: - postgres 8.3 or less # the version is not important - BioSQL 1.0.0 installed on a postgresql database (on Linux) # the version is not important - python-psycopg 1.1.21-14 or less - python-psycopg2 2.0.5.1-6 or less - python 2.4.4-2 # not important - Biopython CVS version 28/05/08, - Loader.py version 1.30 - "psycopg" or "psycopg2" as BioSeqDatabase.open_database drivers During insertion in the BioSQL database of a seq_record object derived from a GenBank Iterator, the procedure _get_seqfeature_dbxref fails with the errror: Traceback (most recent call last): File "", line 1, in ? File "/var/lib/python-support/python2.4/BioSQL/BioSeqDatabase.py", line 420, in load db_loader.load_seqrecord(cur_record) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 50, in load_seqrecord self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 542, in _load_seqfeature self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 641, in _load_seqfeature_qualifiers seqfeature_id) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 679, in _load_seqfeature_dbxref self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 712, in _get_seqfeature_dbxref result = self.adaptor.execute_and_fetch_col0(sql, (seqfeature_id, File "/var/lib/python-support/python2.4/BioSQL/BioSeqDatabase.py", line 295, in execute_and_fetch_col0 self.cursor.execute(sql, args or ()) psycopg.ProgrammingError: ERROR: column "195" does not exist SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id = "195" AND dbxref_id = "207739" The problem is that there is an error in the query format at rows 710 and 711 of the Loader.py in Biopyton/BioSQL: 709 # Check for an existing record 710 sql = r'SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref ' \ 711 r'WHERE seqfeature_id = "%s" AND dbxref_id = "%s"' because the query has double quotes (") around the values, and postgres interprets them as Column names and not values. If you correct the query with single quotes, you correct the error. 709 # Check for an existing record 710 sql = r"SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref " \ 711 r"WHERE seqfeature_id = '%s' AND dbxref_id = '%s'" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Wed May 28 08:31:33 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 28 May 2008 05:31:33 -0700 (PDT) Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files In-Reply-To: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com> Message-ID: <499799.68733.qm@web62408.mail.re1.yahoo.com> That's odd ... I had tried with a Python version 2.3, and it worked there. Maybe this feature was added during the Python 2.3 cycle. Then, I guess we need to use the install_data_biopython class for now, and start using package_data once we stop supporting Python 2.3. --Michiel Peter wrote: On Sun, May 18, 2008, Michiel de Hoon wrote: > Hi everybody, > > In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now > installed using a specialized install_data_biopython class. For Bio.Entrez, I am > using the package_data argument to the setup function instead. Does anybody > know why the install_data_biopython class was used? If there's no specific > reason, I'd prefer to use the package_data argument instead. I think I've found one reason not to - it doesn't seem to be supported in Python 2.3 as shown here: C:\TEMP\biopython_cvs\biopython_all\biopython>c:\Python23\python.exe setup.py install c:\Python23\lib\distutils\dist.py:227: UserWarning: Unknown distribution option: 'package_data' warnings.warn(msg) running install ... If I'd known this earlier, I would of course have said something. On the other hand, I may be the only person still using Biopython with python 2.3. Peter From fkauff at biologie.uni-kl.de Thu May 29 05:20:56 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Thu, 29 May 2008 11:20:56 +0200 Subject: [Biopython-dev] CVS access and developers web site Message-ID: <483E7578.50402@biologie.uni-kl.de> Hi folks, although I've been quiet for a while, I'm still doing some changes to the Nexus parser of biopython from time to time.... I totally lost my passwords to access the repository. Could someone please send me a new password to get write access to cvs? And I would also like to change the information on the biopython developers web site, as they are somewhat outdated. And is this the right place to ask for such things? Thanks! Frank From bugzilla-daemon at portal.open-bio.org Thu May 29 06:47:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 May 2008 06:47:29 -0400 Subject: [Biopython-dev] [Bug 2506] SELECT problems on _get_seqfeature_dbxref in Loader.py with postgresql In-Reply-To: Message-ID: <200805291047.m4TAlT18002239@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2506 ------- Comment #1 from andrea at biodec.com 2008-05-29 06:47 EST ------- Created an attachment (id=926) --> (http://bugzilla.open-bio.org/attachment.cgi?id=926&action=view) Proposed patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Thu May 29 17:46:46 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 May 2008 22:46:46 +0100 Subject: [Biopython-dev] CVS access and developers web site In-Reply-To: <483E7578.50402@biologie.uni-kl.de> References: <483E7578.50402@biologie.uni-kl.de> Message-ID: <320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com> Hi Frank, I would try emailing support at helpdesk.open-bio.org using the email address associated with your CVS username. If you've changed email address, and you run into problems, I expect Michiel or I could vouch for you. For the website, the wiki usernames are entirely separate and you should be able to create a new account if you don't have one already. If you want to update the tutorial new HTML and PDF files are loaded with each release from the version in CVS. Peter On Thu, May 29, 2008 at 10:20 AM, Frank Kauff wrote: > Hi folks, > > although I've been quiet for a while, I'm still doing some changes to the > Nexus parser of biopython from time to time.... I totally lost my passwords > to access the repository. Could someone please send me a new password to get > write access to cvs? And I would also like to change the information on the > biopython developers web site, as they are somewhat outdated. > And is this the right place to ask for such things? > > Thanks! > > Frank From bugzilla-daemon at portal.open-bio.org Fri May 30 07:15:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 May 2008 07:15:23 -0400 Subject: [Biopython-dev] [Bug 2506] SELECT problems on _get_seqfeature_dbxref in Loader.py with postgresql In-Reply-To: Message-ID: <200805301115.m4UBFNE3011942@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2506 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-30 07:15 EST ------- Thanks for the report. I've fixed this issue (method _get_seqfeature_dbxref at line 710) and a similar one (in _get_bioentry_dbxref at line 761) in CVS BioSQL/Loader.py revision 1.31 Note that I have only tested this with MySQL under Linux. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri May 30 10:17:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 30 May 2008 15:17:08 +0100 Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil In-Reply-To: <893127.27535.qm@web62412.mail.re1.yahoo.com> References: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> <893127.27535.qm@web62412.mail.re1.yahoo.com> Message-ID: <320fb6e00805300717v60f0b153i88b5e9a8aee1744c@mail.gmail.com> On 24 May 2008, Michiel de Hoon wrote: > Dear all, > > I have essentially completed the parser in Bio.Entrez. The internals of the new design look more complicated to start with, but I can see how much more general it is than the older versions :) Should it work starting from an empty DTDs folder - or will we ship Biopython with most of the current files? I've had trouble with Biopython trying to fetch missing DTD files from the internet. I think the problem is the NCBI using relative URLs. The following quick hack seems to help in Parser.py but only in some cases (because as listed below, the NCBI have two different base paths): 279,280c279,288 < warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) < handle = urllib.urlopen(systemId) --- > warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path) > if "/" in systemId : > #Assume this is a full path, e.g. > #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd > handle = urllib.urlopen(systemId) > else : > #Its a relative path, and I'm not sure how to best get the base path: > handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId) (Also note there seem to be some tab/space isssues in this file). >From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the following files using wget: egquery.dtd eSearch_020511.dtd nlmcommon_080101.dtd pubmed_080101.dtd eInfo_020511.dtd eSpell.dtd nlmmedline_080101.dtd taxon.dtd eLink_020511.dtd eSummary_041029.dtd nlmmedlinecitation_080101.dtd uilist.dtd ePost_020511.dtd nlmsharedcatcit_080101.dtd Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further XML files needed for the test_Entrez.py unit test: NCBI_GBSeq.dtd NCBI_GBSeq.mod.dtd NCBI_Entity.mod.dtd NCBI_Mim.dtd NCBI_Mim.mod.dtd With all the above files, then the unit test file test_Entrez.py doesn't give any missing DTD warnings - but still has a couple of failures. Peter From bugzilla-daemon at portal.open-bio.org Fri May 30 11:15:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 May 2008 11:15:16 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805301515.m4UFFGhJ024631@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-30 11:15 EST ------- The XML parser seems to be fine on your example output. However, the XML output does not appear to list/flag any difference between: "Sequences used in model and found again" "Sequences not found previously or not previously below threshold" This means there is no way to populate the .new_seqs and .reused_seqs lists. If you care about this information, then for now using the plain text output might be best. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Mon May 5 14:55:42 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 5 May 2008 07:55:42 -0700 (PDT) Subject: [Biopython-dev] BOSC 2008 announcement and call for submissions Message-ID: <698765.93604.qm@web62401.mail.re1.yahoo.com> BOSC 2008 Call for Abstracts Reminder The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008). This is the final reminder to submit your proposals for talks to the BOSC submission system before May 11. Submission Process: All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php). The form will ask for a small Abstract Text to be pasted into it, and a full paper. The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details) Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom. The full-length abstract should include the title, authors, and affiliations. We prefer your abstract to be in PDF format, although plain t Important Dates: May 11: Abstract submission deadline. June 2: Notification of accepted talks. June 4: Early registration discount cut-off. July 18-19: BOSC 2008! We hope to see you at BOSC 2008! Kam Dahlquist and Darin London BOSC 2008 Co-organizers --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From bugzilla-daemon at portal.open-bio.org Wed May 7 15:36:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 May 2008 11:36:43 -0400 Subject: [Biopython-dev] [Bug 2494] _retrieve_taxon in BioSQL.py needs urgent optimization In-Reply-To: Message-ID: <200805071536.m47FahTU028186@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2494 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-07 11:36 EST ------- Created an attachment (id=917) --> (http://bugzilla.open-bio.org/attachment.cgi?id=917&action=view) Patch to BioSQL/BioSeq.py Hi Eric. I've tried your script with MySQL 5.0 under Linux, and see similar example timings, e.g.: getTaxonSQLsimplex took 458.646 ms getTaxonSQL took 8152.112 ms getTaxonSQLall took 8565.304 ms getTaxonLoop took 18.612 ms However, your loop function doesn't return exactly the same list as the original code. In particular you do not exclude taxonomy lineage entries with a rank of "no rank". Also I didn't like the hard coded assumption about taxon_id 1 as a top node. What do you think of this version: def getTaxonLoopPeter(adaptor, taxon_id): # climbing up the hierarchy: bottom-up approach based on the child/parent link with parent_taxon_id taxonomy = [] while taxon_id : name, rank, parent_taxon_id = adaptor.execute_one( "SELECT taxon_name.name, taxon.node_rank, taxon.parent_taxon_id" \ " FROM taxon, taxon_name" \ " WHERE taxon.taxon_id=taxon_name.taxon_id" \ " AND taxon_name.name_class='scientific name'" \ " AND taxon.taxon_id = %s", (taxon_id,)) if taxon_id == parent_taxon_id : # If the taxon table has been populated by the BioSQL script # load_ncbi_taxonomy.pl this is how top parent nodes are stored. # Personally, I would have used a NULL parent_taxon_id here. break if rank <> "no rank" : #For consistency with older versions of Biopython, we are only #interested in taxonomy entries with a stated rank. #Add this to the start of the lineage list. taxonomy.insert(0, name) taxon_id = parent_taxon_id return taxonomy I'm attaching a patch to BioSQL/BioSeq.py that uses this code in place of the current left/right dependent version. While this does seem to be much faster in your test script, I'm not sure how much difference this will make in normal usage. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 8 11:56:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 May 2008 07:56:24 -0400 Subject: [Biopython-dev] [Bug 2496] New: Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2496 Summary: Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option Product: Biopython Version: 1.45 Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk CC: betainverse at gmail.com Problem reported on the mailing list by Katie Edmonds. We need to add the CGI option RUN_PSIBLAST to the Blast URL in order to support PSI-BLAST. However, the current Biopython code can't parse the RID from the resulting HTML which needs another fix. Patch to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 8 11:58:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 May 2008 07:58:46 -0400 Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option In-Reply-To: Message-ID: <200805081158.m48Bwkxq028674@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2496 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-08 07:58 EST ------- Created an attachment (id=918) --> (http://bugzilla.open-bio.org/attachment.cgi?id=918&action=view) Patch to Bio/Blast/NCBIWWW.py This seems to work, however there is another problem in the XML parser. e.g. from Bio.Blast.NCBIWWW import qblast #gi|160837788|ref|NP_075631.2| actin related protein 2/3 complex, subunit 1B sequence = \ "MAYHSFLVEPISCHAWNKDRTQIAICPNNHEVHIYEKSGAKWNKVHELKEHNGQVTGIDWAPESNRIVTC" \ + "GTDRNAYVWTLKGRTWKPTLVILRINRAARCVRWAPNENKFAVGSGSRVISICYFEQENDWWVCKHIKKP" \ + "IRSTVLSLDWHPNNVLLAAGSCDFKCRIFSAYIKEVEERPAPTPWGSKMPFGELMFESSSSCGWVHGVCF" \ + "SASGSRVAWVSHDSTVCLVDADKKMAVATLASETLPLLAVTFITENSLVAAGHDCFPVLFTYDNAAVTLS" \ + "FGGRLDVPKQSSQRGMTARERFQNLDKKASSEGGAATGAGLDSLHKNSVSQISVLSGGKAKCSQFCTTGM" \ + "DGGMSIWDVKSLESALKDLKIK" result_handle1 = qblast('blastp', 'nr', sequence, expect=0.001) result_handle2 = qblast('blastp', 'nr', sequence, i_thresh=0.05, expect=10, run_psiblast="on") -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 8 14:28:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 May 2008 10:28:21 -0400 Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option In-Reply-To: Message-ID: <200805081428.m48ESLbe006861@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2496 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-08 10:28 EST ------- This patch seems to be working - note that you will also need to update Bio/Blast/NCBIXML.py to CVS revision 1.18 in order to parse the results. This is due to a small change in the formatting of the version number in the latest XML output. I would like someone familiar with PSI-Blast to confirm this is OK before I commit this change to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 9 09:01:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 May 2008 05:01:46 -0400 Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option In-Reply-To: Message-ID: <200805090901.m4991kut017980@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2496 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-09 05:01 EST ------- Katie has reported back via the mailing list that there are still issues with multiple PSI-Blast iterations, see: http://lists.open-bio.org/pipermail/biopython/2008-May/004220.html See also the original thread: http://lists.open-bio.org/pipermail/biopython/2008-May/004213.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 9 11:21:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 May 2008 07:21:33 -0400 Subject: [Biopython-dev] [Bug 2497] New: Unit tests do not cover Bio.Blast.NCBIWWW.qblast() Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2497 Summary: Unit tests do not cover Bio.Blast.NCBIWWW.qblast() Product: Biopython Version: 1.45 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Recent NCBI changes to use BLAST 2.2.18+ with their online API broke our XML parser. This was actually reported via the mailing list and fixed quickly. Adding an online unit test to explicitly run a few queries with Bio.Blast.NCBIWWW.qblast() and parse the XML output could have caught this earlier. I'm going to attach a proposed additional unit test to do this, test_NCBIWWW_online.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 9 11:24:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 May 2008 07:24:48 -0400 Subject: [Biopython-dev] [Bug 2497] Unit tests do not cover Bio.Blast.NCBIWWW.qblast() In-Reply-To: Message-ID: <200805091124.m49BOmUD023507@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2497 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-09 07:24 EST ------- Created an attachment (id=919) --> (http://bugzilla.open-bio.org/attachment.cgi?id=919&action=view) Addition unit test This is a simple unit test which calls qblast() twice, once using blastp and once using blastn. The XML results are then parsed, and it checks that a few pre-defined expected matches are found. There is minimal output to the console/output file as I do not want minor details like the precise number of hits to be reported (anticpating these to fluctuate as the databases grow). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From quantrum75 at yahoo.com Fri May 9 13:37:05 2008 From: quantrum75 at yahoo.com (quantrum75) Date: Fri, 9 May 2008 06:37:05 -0700 (PDT) Subject: [Biopython-dev] Anyone needs help? In-Reply-To: Message-ID: <686395.82650.qm@web31404.mail.mud.yahoo.com> Hi there, I am newbie who is interested in contributing. I was wondering if anyone needed any help with a project? I have tried contributing at a few places before and the problem I ran into was that it was too long and unfocused requirements and nothing came of it in the end. What I am looking for is, 1) Something small to start off with. 2) Something I can complete within a short period of time (focused work of a day or two) and reach a definite conclusion. 3) No work is too small for me. 4) I d be willing to do any kind of grunt work and would be glad to help with documentation etc 5) Ideally, it would be something like reviewing some documentation and correcting it, or writing some documentation for a function or whatever for someone who needs to do it but just does not have the time to do it. 6) The kind of work I like to do is work that can be completed. If anyone has such a job in mind, let me know. Thanks for your time. Sincerely Regards Rama ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From biopython at maubp.freeserve.co.uk Fri May 9 15:33:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 9 May 2008 16:33:08 +0100 Subject: [Biopython-dev] Anyone needs help? In-Reply-To: <686395.82650.qm@web31404.mail.mud.yahoo.com> References: <686395.82650.qm@web31404.mail.mud.yahoo.com> Message-ID: <320fb6e00805090833w6977bb3fr6ca32d70cb2887ea@mail.gmail.com> On Fri, May 9, 2008 at 2:37 PM, Rama wrote: > Hi there, > I am newbie who is interested in contributing. I was wondering if anyone needed any help with a project? Hello Rama. What is your background? Do you know anything about bioinformatics for example? Also how experienced are you with python, and have you ever worked with the tools diff, patch and CVS? > I have tried contributing at a few places before and the problem I ran into was that it was too long > and unfocused requirements and nothing came of it in the end. What I am looking for is, > ... > If anyone has such a job in mind, let me know. I would suggest you have a go at Bug 2446, which is small and shouldn't be too complicated. The bug reporter Dave Thompson has been kind enough to provide a few test cases and example code to demonstrate the problem. http://bugzilla.open-bio.org/show_bug.cgi?id=2446 Could you try modifying the Ace parser to just ignore these comment sections? The file you need to look at is Bio/Sequencing/Ace.py http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Sequencing/Ace.py?cvsroot=biopython As you can see from the CVS history, this code hasn't changed since our latest release of Biopython 1.45, so you could work from that if its easier than learning about CVS too. If you can get this to work, then prepare a patch file against the CVS code (or Biopython 1.45) and attach it to the bug. Let me know what you think about trying this. Regards, Peter (one of the Biopython developers) From bugzilla-daemon at portal.open-bio.org Fri May 9 18:20:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 May 2008 14:20:12 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200805091820.m49IKCMh009431@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 ------- Comment #31 from mmokrejs at ribosome.natur.cuni.cz 2008-05-09 14:20 EST ------- Hi, I wanted to test what you have but lack some more user friendly documentation. Specifically, I lack documentation for the class BioSeqDatabase in BioSeqDatabase.py (attachment 915). In the method load which Eric has modified it is not clear to me what would be fetched from NCBI Taxonomy DB. I guess the full lineage, but still I do not know whether as a string or a list of strings or similarly just taxids? The Loader.py (attachment 914) has scary function called remove() and I would like to see moro elaborate explanation what it really does. Imagine I have two subspecies of same species in the database want to delete the first one. Will it zap the parents common to both of them? I wish not. ;-) Also, I am a bit surprised that _get_taxon_id() would actually modify a local database. Could there be another name of could it be split into two functions, one doing the search ove local db, and optionally fetching data via internet and second modifying local db? And, shouldn't the 'if self.fetch_NCBI_taxonomy' have a corresponding elif for the second attempt and the third one? It is a bit too long to read. ;-) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 12 18:40:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 May 2008 14:40:34 -0400 Subject: [Biopython-dev] [Bug 2499] New: Bio.Blast.NCBIXML cannot handle XML without date in BlastOutput_version Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2499 Summary: Bio.Blast.NCBIXML cannot handle XML without date in BlastOutput_version Product: Biopython Version: 1.44 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: n.j.loman at bham.ac.uk I got the following XML file directly from the NCBI website. blastp BLASTP 2.2.18+ Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch????ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. env_nr ... This output raises an exception when put through NCBIXML.parse() due to the absence of a date after the string BLASTP 2.2.18+ The following diff sorts it out: --- /home/nick/biopython/biopython-1.44/Bio/Blast/NCBIXML.py 2007-07-27 21:34:07.000000000 +0100 +++ NCBIXML.py 2008-05-12 18:01:36.000000000 +0100 @@ -212,8 +212,10 @@ Save this to put on each blast record object """ - self._header.version = self._value.split()[1] - self._header.date = self._value.split()[2][1:-1] + s = self._value.split() + self._header.version = s[1] + if len(s) > 2: + self._header.date = s[2][1:-1] def _end_BlastOutput_reference(self): """a reference to the article describing the algorithm I'm sorry, I haven't checked to see if this is fixed in 1.45. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 13 08:09:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 13 May 2008 04:09:53 -0400 Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML without date in BlastOutput_version In-Reply-To: Message-ID: <200805130809.m4D89ro7003140@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2499 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-13 04:09 EST ------- Hi Nick, This was reported earlier on the mailing list, and fixed in Bio/Blast/NCBIXML.py revision 1.18 (at the time I didn't bother to file a bug, maybe I should have): http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIXML.py?cvsroot=biopython If you need the fix urgently, you can either get the whole of Biopython from CVS and install from source, or just replace that one file which can simple be downloaded from ViewCVS (link above). Your exception error will tell you where exactly your local copy of Bio/Blast/NCBIXML.py is. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 13 09:16:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 13 May 2008 05:16:18 -0400 Subject: [Biopython-dev] [Bug 2499] Bio.Blast.NCBIXML cannot handle XML without date in BlastOutput_version In-Reply-To: Message-ID: <200805130916.m4D9GIMV006160@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2499 ------- Comment #2 from n.j.loman at bham.ac.uk 2008-05-13 05:16 EST ------- Many thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Tue May 13 12:07:15 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 13 May 2008 05:07:15 -0700 (PDT) Subject: [Biopython-dev] Reportlab requirement Message-ID: <305778.65303.qm@web62415.mail.re1.yahoo.com> Hi everybody, Currently, setup.py checks if Reportlab is installed or not. If not, you get the following message: *** Reportlab *** is either not installed or out of date. This package is optional, which means it is only used in a few specialized modules in Biopython. You probably don't need this if you are unsure. You can ignore this requirement, and install it later if you see ImportErrors. You can find Reportlab at http://www.reportlab.org/downloads.html. Do you want to continue this installation? (Y/n) Reportlab is only used in Bio.Graphics. Unlike e.g. Numeric, Reportlab can be installed later if needed without having to rebuild Biopython. The Biopython unit tests already skip Bio.Graphics if Reportlab is not found. So I think it is sufficient to check for Reportlab presence only if a user tries to use Bio.Graphics. This will save us the "Do you want to continue this installation? (Y/n) " question above, which may scare of users (and I am quite tired of it myself, too)... Any objections? --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From sdavis2 at mail.nih.gov Tue May 13 12:34:20 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 13 May 2008 08:34:20 -0400 Subject: [Biopython-dev] Reportlab requirement In-Reply-To: <305778.65303.qm@web62415.mail.re1.yahoo.com> References: <305778.65303.qm@web62415.mail.re1.yahoo.com> Message-ID: <264855a00805130534q6451e40fj427a51e4aa729b18@mail.gmail.com> On Tue, May 13, 2008 at 8:07 AM, Michiel de Hoon wrote: > Hi everybody, > > Currently, setup.py checks if Reportlab is installed or not. If not, you get the following message: > > *** Reportlab *** is either not installed or out of date. > > This package is optional, which means it is only used in a few > specialized modules in Biopython. You probably don't need this if you > are unsure. You can ignore this requirement, and install it later if > you see ImportErrors. > You can find Reportlab at http://www.reportlab.org/downloads.html. > > Do you want to continue this installation? (Y/n) > > > Reportlab is only used in Bio.Graphics. Unlike e.g. Numeric, Reportlab can be installed later if needed without having to rebuild Biopython. The Biopython unit tests already skip Bio.Graphics if Reportlab is not found. > > So I think it is sufficient to check for Reportlab presence only if a user tries to use Bio.Graphics. This will save us the "Do you want to continue this installation? (Y/n) " question above, which may scare of users (and I am quite tired of it myself, too)... > > Any objections? I personally think it is a good idea to remove the question, yes. Sean From bugzilla-daemon at portal.open-bio.org Tue May 13 16:25:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 13 May 2008 12:25:49 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200805131625.m4DGPn3W028364@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-13 12:25 EST ------- I see some interesting parrallels for the __getitem__ options for a sequence alignment, and recent and on going discussions on the numpy discussion list for the __getitem__ behaviour of matrices versus arrays. In particular, some participants favour return of row/column vector objects in some situations. Also methods to allow iteration over rows or columns have been suggested. Here with the sequence Alignment class, we could have SeqRecords for the rows, but Seq or strings for the columns. Perhaps we should wait and see how the numpy discussion turns out? However, some of the other options discussed here on this bug are probably worth committing soon (e.g. the __str__ and __repr__ methods) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 14 20:49:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 May 2008 16:49:08 -0400 Subject: [Biopython-dev] [Bug 2500] New: should use python-numpy instead of python-num{eric, array} Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2500 Summary: should use python-numpy instead of python- num{eric,array} Product: Biopython Version: 1.45 Platform: All URL: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=478457 OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mail at philipp-benner.de Both python-numeric and python-numarray do not see new upstream releases anymore; the currently maintained project is python-numpy. Please convert the package to use python-numpy instead. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 15 00:58:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 May 2008 20:58:11 -0400 Subject: [Biopython-dev] [Bug 2500] should use python-numpy instead of python-num{eric, array} In-Reply-To: Message-ID: <200805150058.m4F0wBCO023044@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2500 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2008-05-14 20:58 EST ------- *** This bug has been marked as a duplicate of bug 2251 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu May 15 00:58:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 14 May 2008 20:58:13 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200805150058.m4F0wDfd023057@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mail at philipp-benner.de ------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2008-05-14 20:58 EST ------- *** Bug 2500 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu May 15 13:04:22 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 15 May 2008 14:04:22 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com> References: <6d941f120805150248j11d0c02cq39626c304c9c7e29@mail.gmail.com> <320fb6e00805150540s50912664r462a194261c5c8c2@mail.gmail.com> <320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com> Message-ID: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> Hi all, We are trying to submit an abstract for BOSC 2008 regarding Biopython. Below is the current version. Comments would be very appreciated (we are already after the deadline, so they should come in fast ;) ). Michiel, do you want to add anything to the "future" section? --------------------------------------- Biopython Project Update Tiago Antao[1], Peter Cock[2] In this talk we present the current status of the Biopython project, we focus on features developed since BOSC 2007, future plans for the project and present example usages of the new population genetics module. The latest Biopython release is 1.45 made available on 22 March 2008. Some of the new features are: 1. A new population genetics module including support for coalescent simulation, selection detection and the GenePop file format. The new module relies on existing open source external software (e.g., the open source Simcoal2 for coalescent simulation which is can take advantage of multiple core CPUs for computationally intensive tasks). 2. Improved documentation. 3. Deprecation of many modules which were either obsolete or had been superseded by other code. 4. Plus many bugs were fixed, included updates for evolving file formats. Since the Biopython 1.45 release, further work is planned to extend the Population Genetics module (e.g., with a statistics component). A new sequence alignment module is also being implemented with a uniform API for reading and writing various alignment files, based on the approach of the Bio.SeqIO module added last year for working with sequences. Work to improve Biopython's BioSQL support is also ongoing. Time permitting, the talk will also show usage examples of the new population genetics module. The focus will be put not only on the population genetics side, but also on strategies to easily use all available computational power on new multiple core computers. This is useful for users of the most scripting languages as most language interpreter implementations impose stern limits on multi-threaded programming efficiency, which is important when using computational biology code which is CPU intensive. We will take this opportunity to discuss strategies to overcome those language limitations. Any feedback would really be much appreciated, thanks! From biopython at maubp.freeserve.co.uk Thu May 15 13:48:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 May 2008 14:48:26 +0100 Subject: [Biopython-dev] Bio.AlignIO for sequence alignment input/output Message-ID: <320fb6e00805150648y42e91765oa99eab7e5e1cf8fa@mail.gmail.com> Those of you subscribed to the CVS update feed (see http://biopython.org/wiki/Tracking_CVS_commits and the RRS link) will have noticed some activity in Bio.AlignIO which I originally proposed adding a year ago. See also enhancement Bug 2285, http://bugzilla.open-bio.org/show_bug.cgi?id=2285 I've been using this code on and off in my own work, and have put together a reasonable unit test. I've finished a first draft of a new chapter in the tutorial describing the module (you'll need to run pdflatex or hevea on biopython/Doc/Tutorial.tex to read this), and started a wiki page too: http://www.biopython.org/wiki/AlignIO The API is deliberately very close to that of Bio.SeqIO, but deals with Alignment objects rather than SeqRecord objects. I'm hoping for some feedback now, even if it is as little as pointing out any typos in the documentation. Also additional example input files would be good - and checking the Biopython output is understood by third party tools. One particular issue with the API is handling ambiguous FASTA files which have been used to store more than one alignment (discussed in the updated tutorial). There is an optional argument to the Bio.AlignIO.parse() function to specify the number of sequences expected per alignment which covers the most typical scenarios. I am open to the idea of simply removing this option, which means if the user really wants to parse one of the ambigous files, they would have to read in the individual sequences using Bio.SeqIO, batch them as needed, and then create the alignments. Peter From p.j.a.cock at googlemail.com Thu May 15 13:51:59 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 15 May 2008 14:51:59 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> References: <6d941f120805150248j11d0c02cq39626c304c9c7e29@mail.gmail.com> <320fb6e00805150540s50912664r462a194261c5c8c2@mail.gmail.com> <320fb6e00805150558l3116d4dfhec89367eb7143081@mail.gmail.com> <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> Message-ID: <320fb6e00805150651md383437w2233bc1419589d40@mail.gmail.com> One little typo I should have spotted earlier: 4. Plus many bugs were fixed, included updates for evolving file formats. Should be: 4. Plus many bugs were fixed, including updates for evolving file formats. Also I didn't insert our addresses for the [1] and [2] implied footnotes. Peter From mjldehoon at yahoo.com Sat May 17 03:04:54 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 16 May 2008 20:04:54 -0700 (PDT) Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> Message-ID: <89450.67823.qm@web62411.mail.re1.yahoo.com> Dear Tiago, Thank you for representing Biopython at BOSC! If there's still time, I would suggest to aim the abstract (and also the talk itself) more at the general audience, who may know very little about Biopython or Python. So perhaps an overview of the main modules (no details, just to give an idea of what is covered by Biopython), the Population Genetics module, number of developers, number of users, and perhaps just mention the existence of some other big packages (numerical python, matplotlib, MMTK, ...) that are relevant to science & biology with Python. The point is that most people in the audience are not Biopython users (yet), so for them a general introduction is more suitable. --Michiel. Tiago Ant???o wrote: Hi all, We are trying to submit an abstract for BOSC 2008 regarding Biopython. Below is the current version. Comments would be very appreciated (we are already after the deadline, so they should come in fast ;) ). Michiel, do you want to add anything to the "future" section? --------------------------------------- Biopython Project Update Tiago Antao[1], Peter Cock[2] In this talk we present the current status of the Biopython project, we focus on features developed since BOSC 2007, future plans for the project and present example usages of the new population genetics module. The latest Biopython release is 1.45 made available on 22 March 2008. Some of the new features are: 1. A new population genetics module including support for coalescent simulation, selection detection and the GenePop file format. The new module relies on existing open source external software (e.g., the open source Simcoal2 for coalescent simulation which is can take advantage of multiple core CPUs for computationally intensive tasks). 2. Improved documentation. 3. Deprecation of many modules which were either obsolete or had been superseded by other code. 4. Plus many bugs were fixed, included updates for evolving file formats. Since the Biopython 1.45 release, further work is planned to extend the Population Genetics module (e.g., with a statistics component). A new sequence alignment module is also being implemented with a uniform API for reading and writing various alignment files, based on the approach of the Bio.SeqIO module added last year for working with sequences. Work to improve Biopython's BioSQL support is also ongoing. Time permitting, the talk will also show usage examples of the new population genetics module. The focus will be put not only on the population genetics side, but also on strategies to easily use all available computational power on new multiple core computers. This is useful for users of the most scripting languages as most language interpreter implementations impose stern limits on multi-threaded programming efficiency, which is important when using computational biology code which is CPU intensive. We will take this opportunity to discuss strategies to overcome those language limitations. Any feedback would really be much appreciated, thanks! _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Sat May 17 06:13:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 17 May 2008 02:13:33 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200805170613.m4H6DXDZ016145@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #914 is|0 |1 obsolete| | ------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp 2008-05-17 02:13 EST ------- Created an attachment (id=920) --> (http://bugzilla.open-bio.org/attachment.cgi?id=920&action=view) Replacement for "Usage ... to load a SeqRecord's taxonomy" Recently I made some changes to the Taxonomy parser in Bio.Entrez, specifically to make it more consistent with the other parsers in Bio.Entrez. Some fields in the XML are now accessed slightly differently. I updated Loader.py accordingly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sun May 18 14:33:25 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 18 May 2008 07:33:25 -0700 (PDT) Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files Message-ID: <157512.3075.qm@web62408.mail.re1.yahoo.com> Hi everybody, In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now installed using a specialized install_data_biopython class. For Bio.Entrez, I am using the package_data argument to the setup function instead. Does anybody know why the install_data_biopython class was used? If there's no specific reason, I'd prefer to use the package_data argument instead. --Michiel. From bugzilla-daemon at portal.open-bio.org Mon May 19 09:30:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 05:30:59 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200805190930.m4J9UxLu016813@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-19 05:30 EST ------- This bug is also linked to Bug 2494 (currently titled "_retrieve_taxon in BioSQL.py needs urgent optimization") which is about not using the left/right values when reteiving data from the database. This is important because changes made in this bug (i.e. Bug 2475) may leave the left/right values NULL when writing new lineages. Also, in repley to comment 31, all of the other _get_...() methods of the Loader class can also add things to the database (e.g. qualifier keys). Once you know this, the fact that _get_taxon_id() goes this too isn't a shock. Also, yes, the _get_taxon_id() function is getting far too long, and should probably be restructured as part of this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Mon May 19 12:09:57 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 19 May 2008 13:09:57 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <89450.67823.qm@web62411.mail.re1.yahoo.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> Message-ID: <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> On Sat, May 17, 2008 at 4:04 AM, Michiel de Hoon wrote: > The point is that most people in the audience are not Biopython users (yet), > so for them a general introduction is more suitable. Actually this issue is of a major concern to me... Do you (or anybody) has a feel of what audience will be there? I think it is important to tune the message to the audience. I actually was speculating that people would know about biopython. But if that is not the case, as you imply, then maybe a something that makes biopython more competitive for people which might be deciding which system (language and libraries) might be the best approach... From p.j.a.cock at googlemail.com Mon May 19 12:26:31 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 19 May 2008 13:26:31 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> Message-ID: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> >> The point is that most people in the audience are not Biopython users (yet), >> so for them a general introduction is more suitable. > > Actually this issue is of a major concern to me... Do you (or anybody) > has a feel of what audience will be there? I think it is important to > tune the message to the audience. I actually was speculating that > people would know about biopython. But if that is not the case, as you > imply, then maybe a something that makes biopython more competitive > for people which might be deciding which system (language and > libraries) might be the best approach... Perhaps I should have given you a broader introduction to BOSC itself. There will probably be talks from BioPerl, BioJava and BioRuby in the same session, and I would expect almost all the audience to be familiar with at least one of these projects. However, they may or may not use python, and I would expect that the majority will not be Biopython users. At least, that was my impression last year at BOSC 2007. Reading over the talk titles/abstracts from last year should give you a feel for the sort of people there presenting work outside the Bio* projects. In terms of general impressions, I felt most of the attendees actually did some hands on coding. So yes, as Michiel says, perhaps the current text isn't general enough. This is a regular opportunity to try raise awareness of the project, although I personally wouldn't give a "hard sell", we should try to give a general overview of Biopython's capabilities. Peter From sbassi at gmail.com Mon May 19 12:36:15 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 19 May 2008 09:36:15 -0300 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> Message-ID: On Mon, May 19, 2008 at 9:26 AM, Peter Cock wrote: .... > project, although I personally wouldn't give a "hard sell", we should > try to give a general overview of Biopython's capabilities. This work may give some ideas about introducing Biopython: http://openwetware.org/wiki/Julius_B._Lucks/Projects/Python_All_A_Scientist_Needs From tiagoantao at gmail.com Mon May 19 12:38:34 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 19 May 2008 13:38:34 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> Message-ID: <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com> In order to address this I am thinking in changing the starting paragraph of the "paper" along the following lines: In this talk we present the current status of the Biopython project. We start by giving a short overview of Biopython - presenting existing functionality - and useful software libraries for computational biology in the Python development 'ecology' (from plotting libraries capable of producing publication quality figures to numerical libraries, among others). We then focus on features developed since BOSC 2007, future plans for the project and present example usages of the new population genetics module. I think changing the abstract along these lines might also be good. I think I will target most of the presentation to the idea that the Python ecology of software development is really good (e.g. putting one slide on matplot lib with code and result, to show how concise and simple code can produce nice results). "Selling" Biopython in the whole python context. On Mon, May 19, 2008 at 1:26 PM, Peter Cock wrote: >>> The point is that most people in the audience are not Biopython users (yet), >>> so for them a general introduction is more suitable. >> >> Actually this issue is of a major concern to me... Do you (or anybody) >> has a feel of what audience will be there? I think it is important to >> tune the message to the audience. I actually was speculating that >> people would know about biopython. But if that is not the case, as you >> imply, then maybe a something that makes biopython more competitive >> for people which might be deciding which system (language and >> libraries) might be the best approach... > > Perhaps I should have given you a broader introduction to BOSC itself. > There will probably be talks from BioPerl, BioJava and BioRuby in the > same session, and I would expect almost all the audience to be > familiar with at least one of these projects. However, they may or > may not use python, and I would expect that the majority will not be > Biopython users. At least, that was my impression last year at BOSC > 2007. Reading over the talk titles/abstracts from last year should > give you a feel for the sort of people there presenting work outside > the Bio* projects. In terms of general impressions, I felt most of > the attendees actually did some hands on coding. > > So yes, as Michiel says, perhaps the current text isn't general > enough. This is a regular opportunity to try raise awareness of the > project, although I personally wouldn't give a "hard sell", we should > try to give a general overview of Biopython's capabilities. > > Peter > -- http://www.tiago.org From tiagoantao at gmail.com Mon May 19 12:49:29 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 19 May 2008 13:49:29 +0100 Subject: [Biopython-dev] Fwd: Abstract In-Reply-To: <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com> References: <6d941f120805150604h15dae8f7o33464430e92f25a7@mail.gmail.com> <89450.67823.qm@web62411.mail.re1.yahoo.com> <6d941f120805190509x1a8e3cf6k7e382f21108abe71@mail.gmail.com> <320fb6e00805190526w339c275elaa1d781d02cb256c@mail.gmail.com> <6d941f120805190538p718127ccr76d86b0d0ab40348@mail.gmail.com> Message-ID: <6d941f120805190549u773310aj5df318952eca5e52@mail.gmail.com> By the way, the suggested abstract proposal: Introduction to and news from the Biopython project presenting both existing modules and current developments including a new Population Genetics module and XML parsers for the NCBI's Entrez web interface. An overview of the existing python software ecology will also be presented in relationship with computational biology. Libraries to do, among others, plotting, numerical analysis and molecular modeling will be presented in connection with Biopython and from the point a view of having a complete platform to do research in computational biology. Biopython is freely available on http://www.biopython.org under a liberal "MIT style" open source license, http://www.biopython.org/DIST/LICENSE On Mon, May 19, 2008 at 1:38 PM, Tiago Ant?o wrote: > In order to address this I am thinking in changing the starting > paragraph of the "paper" along the following lines: > > In this talk we present the current status of the Biopython project. > We start by giving a short overview of Biopython - presenting existing > functionality - and useful software libraries for computational > biology in the Python development 'ecology' (from plotting libraries > capable of producing publication quality figures to numerical > libraries, among others). We then focus on features developed since > BOSC 2007, future plans for the project and present example usages of > the new population genetics module. > > > I think changing the abstract along these lines might also be good. > > I think I will target most of the presentation to the idea that the > Python ecology of software development is really good (e.g. putting > one slide on matplot lib with code and result, to show how concise and > simple code can produce nice results). "Selling" Biopython in the > whole python context. > > On Mon, May 19, 2008 at 1:26 PM, Peter Cock wrote: >>>> The point is that most people in the audience are not Biopython users (yet), >>>> so for them a general introduction is more suitable. >>> >>> Actually this issue is of a major concern to me... Do you (or anybody) >>> has a feel of what audience will be there? I think it is important to >>> tune the message to the audience. I actually was speculating that >>> people would know about biopython. But if that is not the case, as you >>> imply, then maybe a something that makes biopython more competitive >>> for people which might be deciding which system (language and >>> libraries) might be the best approach... >> >> Perhaps I should have given you a broader introduction to BOSC itself. >> There will probably be talks from BioPerl, BioJava and BioRuby in the >> same session, and I would expect almost all the audience to be >> familiar with at least one of these projects. However, they may or >> may not use python, and I would expect that the majority will not be >> Biopython users. At least, that was my impression last year at BOSC >> 2007. Reading over the talk titles/abstracts from last year should >> give you a feel for the sort of people there presenting work outside >> the Bio* projects. In terms of general impressions, I felt most of >> the attendees actually did some hands on coding. >> >> So yes, as Michiel says, perhaps the current text isn't general >> enough. This is a regular opportunity to try raise awareness of the >> project, although I personally wouldn't give a "hard sell", we should >> try to give a general overview of Biopython's capabilities. >> >> Peter >> > > > > -- > http://www.tiago.org > -- http://www.tiago.org From bugzilla-daemon at portal.open-bio.org Mon May 19 13:46:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 09:46:22 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805191346.m4JDkMMf028474@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|blocker |major ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-19 09:46 EST ------- I see from comment 11 that some nasty quote escaping may be needed (which could be an NCBI bug). Have you been able to try using relative paths at the command line (avoiding spaces ideally)? Unfortunately my Windows machine is currently without internet access, which is one reason why I haven't made time to sit down and explore this issue. P.S. I don't think this is a critical bug in Biopython, although I do take your point that it your setup this is a big issue. Downgrading this to severity "major". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 19 21:03:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 17:03:44 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805192103.m4JL3iSk021133@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #13 from drpatnaik at yahoo.com 2008-05-19 17:03 EST ------- To get BioPython call BLAST, this works: 1. my_blast_exe = r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"' Variations like these do not work: 2. "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" 3. r"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" The error being: 'C:/Documents' is not recognized as an internal or external command, operable program or batch file. With my_blast_exe set to the 1st value constant, and trying different my_blast_db values, BLAST reports: [NULL_Caption] ERROR: Arguments must start with '-' (the offending argument #5 was: 'and') /* or 'and\' or 'and\\' */ The values tried for my_blast_db are: 4. 'C:/Documents and Settings/patnaik/My Documents/blast/bin/mine' 5. 'C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine' 6. 'C:/Documents\\ and\ Settings/patnaik/My\\ Documents/blast/bin/mine' 7. "C:/Documents and Settings/patnaik/My Documents/blast/bin/mine" 8. "C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine" 9. "C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine" 10. r'C:/Documents and Settings/patnaik/My Documents/blast/bin/mine' 11. r'C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine' 12. r'C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine' 13. r"C:/Documents and Settings/patnaik/My Documents/blast/bin/mine" 14. r"C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine" 15. r"C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine" But a different error ...: 'C:/Documents' is not recognized as an internal or external command, operable program or batch file. ... is shown with these values: 16. r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/mine"' 17. r'"C:/Documents\ and\ Settings/patnaik/My\ Documents/blast/bin/mine"' 18. r'"C:/Documents\\ and\\ Settings/patnaik/My\\ Documents/blast/bin/mine"' That same error is also seen when I try these variations of the value that works in command-line BLAST (comment #10 above): 19. r'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"' 20. r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""' 20. "\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"" 21. r"\"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"\"" 22. r"'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'" Doesn't this suggest that Biopython is not passing the my_blast_db value properly? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 19 21:36:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 17:36:42 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805192136.m4JLag7h022387@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #14 from drpatnaik at yahoo.com 2008-05-19 17:36 EST ------- (continuing comment #13) 23. r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine"' 24. '"C:\\Documents and Settings\\patnaik\\My Documents\\blast\\bin\\mine"' 25. '\\"C:\\Documents and Settings\\patnaik\\My Documents\\blast\\bin\\mine\\"' 26. r"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"" 27. r"'\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"'" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon May 19 21:47:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 May 2008 17:47:00 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805192147.m4JLl0HQ022723@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #15 from drpatnaik at yahoo.com 2008-05-19 17:47 EST ------- (In reply to comment #13) > To get BioPython call BLAST, this works: > 1. my_blast_exe = r'"C:/Documents and Settings/patnaik/My > Documents/blast/bin/blastall.exe"' I forgot to add that I had to comment-out the os.path.exists in NCBIStandaolne.py to get to that step. Equivalently, with this script I get the 'does not exist' message: import os my_blast_exe =r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"' if not os.path.exists(my_blast_exe): print 'cannot find my_blast_exe' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue May 20 16:31:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 20 May 2008 12:31:41 -0400 Subject: [Biopython-dev] [Bug 2496] Bio.Blast.NCBIWWW.qblast() does not support RUN_PSIBLAST option In-Reply-To: Message-ID: <200805201631.m4KGVfF8016867@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2496 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-20 12:31 EST ------- Follow up discussion on the mailing list: http://lists.open-bio.org/pipermail/biopython/2008-May/004231.html Katie wrote: > I asked NCBI about this, and they (eventually) replied that it's "not > officially supported." I have been unable to figure out how to get it to > return iterations after the first one. I'm going to close this bug as "invalid" unless the NCBI do make a public API for PSI-BLAST. It looks like the only solution for now would be to install the standalone blast tools... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 02:45:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 20 May 2008 22:45:58 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805210245.m4L2jwgM013784@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #16 from drpatnaik at yahoo.com 2008-05-20 22:45 EST ------- Similar to what I mentioned in comment #10 this BLAST command-line code works: (1) "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\"" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7 Now I've been trying to see the system call popen3 makes in line 1662 of NCBIStandalone.py by putting this line of code before the os.popen3(" ".join([blastcmd] + params): print " ".join([blastcmd] + params) (as reported in comment #15, I do have to first disable the os.path.exists) Using these values in my test script: my_blast_db =r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""' my_blast_file =r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin"' my_blast_exe =r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe"' I get a print command result that is identical to the working BLAST command-line code (1). "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7 But BLAST doesn't get called and the error reported is: 'C:/Documents' is not recognized as ... Finally I tried replacing the code inside the os.popen3 of NCBIStandalone.py with the string (1): w, r, e = os.popen3(r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7') And I get the same error: 'C:/Documents' is not recognized as ... With a non-Biopython-dependent script, I get the same error (irrespective of the quote combinations I tried): import os w, r, e = os.popen3(r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin" -m 7') print e.read() ------------------------------------------------------------------- FINAL THOUGHTS I think I've to give up on this. There seem to be two incurable issues, unlikely Biopython-specific: os.path.exists (see comment #15) and os.popen3 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 08:34:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 04:34:52 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805210834.m4L8YqVL004607@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-21 04:34 EST ------- The os.path.exists(...) check in Biopython should be easy to fix, probably by the user specifying the exe name without quotes and biopython adding the quotes when building the command line. For specifying the NCBI database locations, have you set the database folder using NCBI.ini yet? I'm not sure if it will work if the INI file is in the BLAST directory as the NCBI documentation says it should go in the Windows directory (which you don't have write access to). Perhaps anywhere on the path will work. See ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blast.html There is also the option of using relative paths... You might get more success talking to the machine administrator and asking them to install BLAST for you? The good news is my home internet connection is up and running, so I may be able to do a little investigation on this issue now (time permitting). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed May 21 09:21:51 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 21 May 2008 10:21:51 +0100 Subject: [Biopython-dev] Next release? Message-ID: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> >From the discussion list, quite a few people have suffered from the NCBI tweaking the online Blast XML format with 2.2.18+ (bug 3499), so it would be nice to get a new release out soon to address this. See http://bugzilla.open-bio.org/show_bug.cgi?id=2499 How do the other modules stand at the moment? Bio.PopGen (Tiago). Is this currently stable, or are you in the middle of adding more features? Bio.Entrez (Michiel). I see you've been very busy with the new simplified XML parsers (see bug 2488). This looks like a big improvement on the rather repetitive coding needed in the first draft. Are you still actively making further refinements? How many Entrez XML file formats are still needed? http://bugzilla.open-bio.org/show_bug.cgi?id=2488 Bio.AlignIO - this is new, but has a reasonable amount of documentation and a small unit test (see bug 2285). If we did do a release soon, it could be announced as "in beta", and subject to change, but feedback welcomed. http://bugzilla.open-bio.org/show_bug.cgi?id=2285 In terms of the unit tests, I haven't run them on Windows recently (internet access issues, hopefully resolved now), but on Linux things looks fine. Peter From mjldehoon at yahoo.com Wed May 21 09:40:25 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 21 May 2008 02:40:25 -0700 (PDT) Subject: [Biopython-dev] Next release? In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> Message-ID: <928585.24226.qm@web62401.mail.re1.yahoo.com> Peter wrote:Bio.Entrez (Michiel). I see you've been very busy with the new simplified XML parsers (see bug 2488). This looks like a big improvement on the rather repetitive coding needed in the first draft. Are you still actively making further refinements? How many Entrez XML file formats are still needed? http://bugzilla.open-bio.org/show_bug.cgi?id=2488 I am still making refinements. I am using this module a lot for my own work, and I have a lot of changes that are not in CVS yet. The final result should be much simpler than what is in CVS now. In particular, we won't have to write a Python module for each DTD, but let Python figure out the DTD for itself. Once this is finished (hopefully soon), I'd be happy to make a new release. --Michiel. From bugzilla-daemon at portal.open-bio.org Wed May 21 10:48:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 06:48:40 -0400 Subject: [Biopython-dev] [Bug 2501] New: Minor erratas in module Bio.SeqRecord Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2501 Summary: Minor erratas in module Bio.SeqRecord Product: Biopython Version: Not Applicable Platform: All OS/Version: Linux Status: NEW Severity: trivial Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: xbello at gmail.com line 32: description - Seqeuence description, optional (string) line 63: if self.description : lines.append("Desription: %s" % self.description) Seqeuence instead of Sequence Desription instead of Description -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 11:28:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 07:28:33 -0400 Subject: [Biopython-dev] [Bug 2501] Minor erratas in module Bio.SeqRecord In-Reply-To: Message-ID: <200805211128.m4LBSX99014512@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2501 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-21 07:28 EST ------- Thanks for point those out - fixed in CVS revision 1.16 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Wed May 21 11:41:15 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 21 May 2008 12:41:15 +0100 Subject: [Biopython-dev] Next release? In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> References: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> Message-ID: <6d941f120805210441w4f3fc3d7m42ee5531dca127df@mail.gmail.com> On Wed, May 21, 2008 at 10:21 AM, Peter wrote: > Bio.PopGen (Tiago). Is this currently stable, or are you in the middle > of adding more features? Long story: I will just add after moving to SVN. Actually the most important part is going to be added next, but I am waiting for SVN (any news on that front?). The statistics part that I will be commiting is the core of the module... Short story: Don't worry with me if you are doing a release in the next couple of weeks... From bugzilla-daemon at portal.open-bio.org Wed May 21 11:51:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 07:51:17 -0400 Subject: [Biopython-dev] [Bug 2502] New: PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2502 Summary: PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 Product: Biopython Version: 1.45 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: ibdeno at gmail.com When parsing a PSI-Blast result from blastpgp version 2.2.18 I get this error: Traceback (most recent call last): File "./lpbl.py", line 23, in b_record = b_parser.parse(blast_out) File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 760, in parse self._scanner.feed(handle, self._consumer) File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 98, in feed self._scan_header(uhandle, consumer) File "/home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py", line 208, in _scan_header raise ValueError("Invalid header?") ValueError: Invalid header? The same script and same input just works with blastpgp 2.2.15 I will attach script and input file later. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 11:56:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 07:56:53 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805211156.m4LBurSt016108@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #1 from ibdeno at gmail.com 2008-05-21 07:56 EST ------- Created an attachment (id=921) --> (http://bugzilla.open-bio.org/attachment.cgi?id=921&action=view) Contains a script and an example sequence to reproduce the bug Change in the script the location of the blast command and of the database to be used. Run it as: ./lpbl.py hsTXN.prot.fasta 3 The second argument is the number of iterations for blastpgp -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 13:05:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 09:05:13 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805211305.m4LD5DhV020562@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-21 09:05 EST ------- Miguel - could you also attach the XML output from blastpgp 2.2.15 and 2.2.18 please? e.g. Something like this if you want to do it via Biopython: blast_out, error_info = NCBIStandalone.blastpgp( blastcmd='/opt/Bio/blast-2.2.15/bin/blastpgp', database='/opt/databases/BlastDB/nrdb100ncbi', infile=file, npasses=passes, align_view='0', matrix_outfile=file + '.pssm') handle = open("blastpgp_2.2.15.xml","w") handle.write(blast_out.read()) handle.close() Thanks, Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 14:44:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 10:44:41 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805211444.m4LEifII025392@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #3 from ibdeno at gmail.com 2008-05-21 10:44 EST ------- Created an attachment (id=922) --> (http://bugzilla.open-bio.org/attachment.cgi?id=922&action=view) Plain text and XML outputs from blastgpg The names should be self-explanatory. The log files where produced with the appropriate blastpgp version using the command line: blastpgp -i hsTXN.prot.fasta -d /drives/databases/BlastDB/nrdb100ncbi -j 1 -m [0,7] m = 0 is plain text (as in the original submitted bug) m = 7 is XML -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed May 21 17:21:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 May 2008 13:21:27 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200805211721.m4LHLRX1003810@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #18 from drpatnaik at yahoo.com 2008-05-21 13:21 EST ------- The BLAST database folder being inside blast/bin seems to be fine as command-line BLAST does work. I haven't tried relative paths. It should work, as should using an external drive that can provide for space-less paths. But the issue of spaces in paths on Windows remains. I thank you for your suggestions and efforts looking into it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jblanca at btc.upv.es Thu May 22 07:30:52 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Thu, 22 May 2008 09:30:52 +0200 Subject: [Biopython-dev] sequence class proposal Message-ID: <200805220930.53004.jblanca@btc.upv.es> Dear Biopython developers, I've been using python and Biopython for some time now and I would like to talk with you about the sequence classes in Biopython. I have had some issues using the SeqRecord and Alignment classes and I have being discussing and implementing with two students (Victor Sanchez y Pablo Martinez) a proposal of a new sequence class. We would like to present this implementation as a tip in the discussion about the design of the sequence classes in Biopython and we're eager to receive your comments. The first problem that I found with the SeqRecord is the lack of support for qualities. And it is also difficult to implement this quality support in a SeqRecord derived class. There's a problem with the current SeqRecord API that difficults this. Let me explain it. Currently SeqRecord has a seq property and if you want an slice or if you need to reverse or complement you would do something like: my_seq = SeqRecord() my_seq.seq = Seq('ACTG') my_seq.seq[0:2] my_seq.seq = my_seq.reverse() If I derive a class from SeqRecord with a qual property I don't know how to reverse both the sequence and the quality at the same time, because now the Seq methods are called directly without SeqRecord being aware of that. In order to support that we have discuss a new class with a slightly different API and we have done a preliminary implementation. We have named this new class as RichSeq, and we think that this could solve the quality problem. With this new class it would work like this: myseq = RichSeq(seq='ACTG', qual=[50,50,50,50]) subseq = myseq[0:2] myseq.reverse() myseq.complement() RichSeq is equivalent to SeqRecord and it has the same properties as SeqRecord, but it adds the methods __getitem__, reverse, complement and reverse_complement. We have also implemented a new type of features, we have called them RichFeature. They are similar to the SeqFeature. The main difference is that instead of a location and a location operator, they have a BioRange (another new class). This BioRange is inspired/copied from the Bioperl library. The BioRange is optional, so some RichFeature uses would be: RichFeature(id='a_feature', type='annotation', feature='this is an annotation') RichFeature(id='a_feature', type='subsequence', feature=Seq('ACTG')) range = BioRange(start=3,end=6) feat = RichFeature(type='annotation', range=range, feature='some_annotation, e.g. an exon') seq = RichSeq(seq='ACTGACTG', features=[feat]) With this implementation you can define a sequence with seq, qual and annotations associated with a range in a easy way, and after that you can reverse and complement them in a trivial way. range = BioRange(start=3,end=6) feat = RichFeature(type='annotation', range=range, feature='some_annotation') seq = RichSeq(seq='ACTGACTG', qual=[60,60,60,60,60,60,60,60], features=[feat]) seq.reverse() By the way, this is a mutable class, although that could be easily changed. You can even use Seqs and RichSeq as subsequences and ask for slices or complements. range = BioRange(start=1,end=2) feat = RichFeature(type='subsequence', feature=RichSeq(seq='CT'), range=range) seq = RichSeq(seq='ACTG', features=[feat]) seq2 = seq[1:2] seq.reverse() This capability makes this RichSeq an excellent candidate for a base class for an Alignment implementation, but we have not implemented this yet. Attach to this mail you can find the implementation of this new classes. They have some tests that provide an idea about their intended use. We would like to know about your opinions and suggestions. Do you think that this kind of functionality is desirable? Please let us know about any flaw, specially in the API. I think that my work would be easier using a sequence class similar to RichSeq, but maybe there's an easier way. Do you think that is a good idea to attach this classes to bugzilla? Do we open a new bug or there's one for this sequence class debate already open? Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) -------------- next part -------------- A non-text attachment was scrubbed... Name: richseq.0.0.1.tar.gz Type: application/x-tgz Size: 7075 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Thu May 22 15:47:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 22 May 2008 16:47:58 +0100 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <200805220930.53004.jblanca@btc.upv.es> References: <200805220930.53004.jblanca@btc.upv.es> Message-ID: <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> On Thu, May 22, 2008 at 8:30 AM, Jose Blanca wrote: > Dear Biopython developers, > I've been using python and Biopython for some time now and I would like to > talk with you about the sequence classes in Biopython. I have had some issues > using the SeqRecord and Alignment classes and I have being discussing and > implementing with two students (Victor Sanchez y Pablo Martinez) a proposal > of a new sequence class. We would like to present this implementation as a > tip in the discussion about the design of the sequence classes in Biopython > and we're eager to receive your comments. If I understood your terminology correctly, "qualities" is a list of scores, one for each letter in the sequence. I see this is a special case of a more general situation where you have per-letter-annotation information. Examples include secondardy structure or residue coordinates of a protein sequence. Very often for example, secondary structures are stored in files as a simple string whise length matches the length of the sequence. Also related are sub-features like domains or promotor sites which span a range of residues. So I would agree with you that an enhanced class would be useful, where the per letter annotations were respected in splicing, reversing etc. Handling sub-features when slicing is less straight forward. The current SeqRecord and Seq classes separate the sequence annotation from the sequence letters themselves, making this sort of integration difficult. Making the SeqRecord a direct subclass of the Seq object has previously been suggested and would pave the way for this sort of operation. See Bug 2351 where some of these ideas have been floated... http://bugzilla.open-bio.org/show_bug.cgi?id=2351 There are a lot of things that would need to be discussed - for example how would you handle the pre-sequence annotation (e.g. record identifiers) when adding two "rich" seqeunces? I've been content with making small steps for now, with backwards compatibility always in mind. On another note, I'm also thinking about the need for an annotated sequence alignment object, where there are similar concerns. Also, have you discussed the alphabet objects? > Do you think that is a good idea to attach this classes to bugzilla? Do we > open a new bug or there's one for this sequence class debate already open? Your proposals do seem very broad, so have a look at Bug 2351 first, but perhaps start a new enhancement bug, and then attach the code. Peter From bugzilla-daemon at portal.open-bio.org Fri May 23 10:06:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 06:06:44 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231006.m4NA6itj022486@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 06:06 EST ------- I've worked out that the original problem was use to trying to parse XML output with the Bio.Blast.NCBIStandalone.PSIBlastParser (which expects the plain text output only). Perhaps the error message could be more helpful in this situation? I'm using Biopython from CVS, but it seems to parse the plain text output from both 2.2.15 and 2.2.18 fine. Here is a modified version of your code which reads from the example plain text files provided: #!/usr/bin/env python # import os, re, string, operator from Bio.Blast import NCBIStandalone from sys import * E_VALUE_THRESH = 0.005 nolf = re.compile('\n') nogaps = re.compile('-') blast_out = open("blastpgp.2.2.18.txt") b_parser = NCBIStandalone.PSIBlastParser() b_record = b_parser.parse(blast_out) if b_record.converged == 1: print '*** Converged!!! ***' fastaout = open('test_psiblast.fasta','w') summout = open('test_psiblast.txt','w') for alignment in b_record.rounds[-1].alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: ident = 100.0*hsp.identities[0]/hsp.identities[1] simil = 100.0*hsp.positives[0]/hsp.positives[1] mytitle = nolf.sub(' ',alignment.title) mysbjct = nogaps.sub('',hsp.sbjct) summout.write('****Alignment****\n') summout.write('sequence: %s\n' % mytitle[0:70]) summout.write('e value: %e\n' % hsp.expect) summout.write('alignment length: %i\n' % hsp.positives[1]) summout.write('identity: %(ident)5.2f\n' % {'ident': ident} ) summout.write('similarity: %(simil)5.2f\n' % {'simil': simil} ) summout.write('query: from %i to %i\n' % (hsp.query_start, hsp.query_end)) summout.write('subject: from %i to %i\n' % (hsp.sbjct_start, hsp.sbjct_end)) summout.write('%s ...\n' % hsp.query[0:75]) summout.write('%s ...\n' % hsp.match[0:75]) summout.write('%s ...\n' % hsp.sbjct[0:75]) fastaout.write('%s\n%s\n' % (mytitle,mysbjct)) summout.close() fastaout.close() print "Done" ---------------------------------------------------------------------- So, as far as I can tell, the plain text PSI Blast parser is fine . As I do not have the relevant databases installed, I have not tried using Biopython to call blastpgp to run PSI-Blast. It could be there is a problem here with specifying the output format... As to the XML output, you can sort of parse this with Bio.Blast.NCBIXML and I think you get back an iterator yielding a record for each iteration. However, as the example you provided had only one query and one iteration, this should be tested further. The record is not showing all the information extracted by the PSI-Blast text parse, which should be in the XML file. Perhaps you would like to investigate this? Example code: from Bio.Blast import NCBIStandalone, NCBIXML for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] : print print filename print "="*len(filename) handle = open(filename) record = NCBIStandalone.PSIBlastParser().parse(handle) print record.query if record.converged : print '*** Converged!!! ***' for iter_round in record.rounds : print "Iteration with %i alignments" \ % (len(iter_round.alignments)) print "%i new sequences, %i reused" \ %(len(iter_round.new_seqs), len(iter_round.reused_seqs)) print "End of plain text output" for filename in ["blastpgp.2.2.15.xml", "blastpgp.2.2.18.xml"] : print print filename print "="*len(filename) handle = open(filename) for iter_round in NCBIXML.parse(handle) : print iter_round.query print "Iteration with %i alignments" \ % (len(iter_round.alignments)) print "End of XML output" The output: blastpgp.2.2.15.txt =================== gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] Iteration with 250 alignments 500 new sequences, 0 reused End of plain text output blastpgp.2.2.18.txt =================== gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] Iteration with 250 alignments 500 new sequences, 0 reused End of plain text output blastpgp.2.2.15.xml =================== gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] Iteration with 500 alignments End of XML output blastpgp.2.2.18.xml =================== gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] Iteration with 250 alignments End of XML output Notice that NCBI must have changed the XML format in some way (500 versus 250 alignments between versions 2.2.15 and 2.2.18). I have not explored this in any detail. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 10:45:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 06:45:58 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231045.m4NAjwr4023917@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #5 from ibdeno at gmail.com 2008-05-23 06:45 EST ------- Hi Peter, Thank you. The problem must be then with the blastpgp call from Biopython, since my code was trying to obtain plain text via the align_view='0' option: blast_out, error_info = NCBIStandalone.blastpgp( blastcmd='/home/mortiz/Progs/blast-2.2.15/bin/blastpgp', database='/drives/databases/BlastDB/nrdb100ncbi', infile=file, npasses=passes, align_view='0', matrix_outfile=file + '.nrdb100ncbi.pssm') However, when I print the result of this call with the handler you proposed: handle = open("blastpgp_2.2.18.txt","w") handle.write(blast_out.read()) handle.close() I actually get plain text! The same blastpgp call (same binary, same database, same input file sequence, same number of PSI-Blast iterations) still gives the error reported in the bug with version 2.2.18, but works all right with 2.2.15. Because the error appears within seconds, I'm wondering if the parser is not trying to read the results before blastpgp has actually finished the iterations (about 3 minutes in my test) I'm without a clue... Miguel (In reply to comment #4) > I've worked out that the original problem was use to trying to parse XML output > with the Bio.Blast.NCBIStandalone.PSIBlastParser (which expects the plain text > output only). Perhaps the error message could be more helpful in this > situation? > > I'm using Biopython from CVS, but it seems to parse the plain text output from > both 2.2.15 and 2.2.18 fine. Here is a modified version of your code which > reads from the example plain text files provided: > > #!/usr/bin/env python > # > import os, re, string, operator > from Bio.Blast import NCBIStandalone > from sys import * > > E_VALUE_THRESH = 0.005 > > nolf = re.compile('\n') > nogaps = re.compile('-') > > blast_out = open("blastpgp.2.2.18.txt") > b_parser = NCBIStandalone.PSIBlastParser() > b_record = b_parser.parse(blast_out) > > if b_record.converged == 1: > print '*** Converged!!! ***' > > fastaout = open('test_psiblast.fasta','w') > summout = open('test_psiblast.txt','w') > > for alignment in b_record.rounds[-1].alignments: > for hsp in alignment.hsps: > if hsp.expect < E_VALUE_THRESH: > ident = 100.0*hsp.identities[0]/hsp.identities[1] > simil = 100.0*hsp.positives[0]/hsp.positives[1] > mytitle = nolf.sub(' ',alignment.title) > mysbjct = nogaps.sub('',hsp.sbjct) > summout.write('****Alignment****\n') > summout.write('sequence: %s\n' % mytitle[0:70]) > summout.write('e value: %e\n' % hsp.expect) > summout.write('alignment length: %i\n' % hsp.positives[1]) > summout.write('identity: %(ident)5.2f\n' % {'ident': ident} ) > summout.write('similarity: %(simil)5.2f\n' % {'simil': simil} ) > summout.write('query: from %i to %i\n' % (hsp.query_start, > hsp.query_end)) > summout.write('subject: from %i to %i\n' % (hsp.sbjct_start, > hsp.sbjct_end)) > summout.write('%s ...\n' % hsp.query[0:75]) > summout.write('%s ...\n' % hsp.match[0:75]) > summout.write('%s ...\n' % hsp.sbjct[0:75]) > fastaout.write('%s\n%s\n' % (mytitle,mysbjct)) > > summout.close() > fastaout.close() > print "Done" > > ---------------------------------------------------------------------- > > So, as far as I can tell, the plain text PSI Blast parser is fine . > > As I do not have the relevant databases installed, I have not tried using > Biopython to call blastpgp to run PSI-Blast. It could be there is a problem > here with specifying the output format... > > As to the XML output, you can sort of parse this with Bio.Blast.NCBIXML and I > think you get back an iterator yielding a record for each iteration. However, > as the example you provided had only one query and one iteration, this should > be tested further. The record is not showing all the information extracted by > the PSI-Blast text parse, which should be in the XML file. Perhaps you would > like to investigate this? > > Example code: > > from Bio.Blast import NCBIStandalone, NCBIXML > > for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] : > print > print filename > print "="*len(filename) > handle = open(filename) > record = NCBIStandalone.PSIBlastParser().parse(handle) > print record.query > if record.converged : print '*** Converged!!! ***' > for iter_round in record.rounds : > print "Iteration with %i alignments" \ > % (len(iter_round.alignments)) > print "%i new sequences, %i reused" \ > %(len(iter_round.new_seqs), len(iter_round.reused_seqs)) > print "End of plain text output" > > for filename in ["blastpgp.2.2.15.xml", "blastpgp.2.2.18.xml"] : > print > print filename > print "="*len(filename) > handle = open(filename) > for iter_round in NCBIXML.parse(handle) : > print iter_round.query > print "Iteration with %i alignments" \ > % (len(iter_round.alignments)) > print "End of XML output" > > The output: > > blastpgp.2.2.15.txt > =================== > gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] > Iteration with 250 alignments > 500 new sequences, 0 reused > End of plain text output > > blastpgp.2.2.18.txt > =================== > gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] > Iteration with 250 alignments > 500 new sequences, 0 reused > End of plain text output > > blastpgp.2.2.15.xml > =================== > gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] > Iteration with 500 alignments > End of XML output > > blastpgp.2.2.18.xml > =================== > gi|50592994|ref|NP_003320.2| thioredoxin [Homo sapiens] > Iteration with 250 alignments > End of XML output > > Notice that NCBI must have changed the XML format in some way (500 versus 250 > alignments between versions 2.2.15 and 2.2.18). I have not explored this in > any detail. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 11:02:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 07:02:44 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231102.m4NB2iPS024763@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 07:02 EST ------- That's an interesting theory - reading directly from standard out is causing the problem (comment 5). One thing you could try is writing the blastpgp output to a file, and then opening the file for reading. I'm not sure if blastpgp has a file output option. You could just try this: blast_out, error_info = NCBIStandalone.blastpgp(...) handle = open("blastpgp_2.2.18.txt","w") handle.write(blast_out.read()) handle.close() blast_out = open("blastpgp_2.2.18.txt") b_parser = NCBIStandalone.PSIBlastParser() b_record = b_parser.parse(blast_out) ... Or, for a very crude workaround: from time import sleep blast_out, error_info = NCBIStandalone.blastpgp(...) sleep(5*60) #Five minutes b_parser = NCBIStandalone.PSIBlastParser() b_record = b_parser.parse(blast_out) ... If those work, it would be good evidence that your theory is right. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jblanca at btc.upv.es Fri May 23 11:10:13 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 23 May 2008 13:10:13 +0200 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> References: <200805220930.53004.jblanca@btc.upv.es> <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> Message-ID: <200805231310.13408.jblanca@btc.upv.es> Hi: After reding the suggestions in Bug 2351 I've coded a MutableSeq class that inherits from UserString.MutableString instead of using an array stored in self.data. It's quite easily to do it work as the MutableSeq present in Biopytyhon 1.45, but there's some problems to solve. I don't know if this class would be faster or easier to maintain than the MutableSeq that uses array.array. I've just done that as an experiment to learn something about Biopython. Now the compatibility problems that I have found... self.data is not an array but an str. That's not easy to solve becase MutableString uses self.data internaly. I tried to define a property class, but MutableString is an old style class. Maybe I don't know enough python, but I don't know how to solve this type mismatch. append() and extend() could be coded using __add__(). insert() and remove() are not supported by MutableSeq and would have to be coded. But I don't see the point of this methods in a sequence class. I think that the Seq and the MutableSeq API should be as similar as possible and since Seq uses __add__() I don't understand why MutableSeq should use append() and extend(). I also have problems with del seq[2:4:-1] and seq[2::3] = "N" * len(seq[2::3]) All the other tests for MutableSeq just work. -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From bugzilla-daemon at portal.open-bio.org Fri May 23 12:38:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 08:38:28 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231238.m4NCcS0S028452@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #7 from ibdeno at gmail.com 2008-05-23 08:38 EST ------- Unfortunately the hypothesis was not correct. If I create an intermediate file, the parser works well if the file comes from blastpgp 2.2.15 but chokes on 2.2.18. There is a new reference in 2.2.18 header: Reference for compositional score matrix adjustment: Altschul, Stephen F., John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. which falls between the two ones existing in the 2.2.15 version and makes the header longer in terms of number of lines... Might be this? Miguel (In reply to comment #6) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 14:30:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 10:30:42 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231430.m4NEUgVL001388@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 10:30 EST ------- I'm using the CVS version of Biopython under Linux. The file main NCBIStandalone.py hasn't changed since Biopython 1.45, although Record.py has. I am a little puzzled about why I can parse both the 2.2.15 and the 2.2.18 plain text examples you provided without problems, but something fails for you. Could you double check what happens on your machine using these two example files from attachment 922 comment 3, and this code I gave in comment 4: from Bio.Blast import NCBIStandalone for filename in ["blastpgp.2.2.15.txt", "blastpgp.2.2.18.txt"] : print print filename print "="*len(filename) handle = open(filename) record = NCBIStandalone.PSIBlastParser().parse(handle) print record.query if record.converged : print '*** Converged!!! ***' for iter_round in record.rounds : print "Iteration with %i alignments" \ % (len(iter_round.alignments)) print "%i new sequences, %i reused" \ %(len(iter_round.new_seqs), len(iter_round.reused_seqs)) print "End of plain text output" If this doesn't work, please give the full stack trace - "chokes" is a little vague. Looking at the example files you provided in attachment 922 comment 3, they seem to have replaced one reference with another. This is the start of the diff output comparing the two files: 1c1 < BLASTP 2.2.15 [Oct-15-2006] --- > BLASTP 2.2.18 [Mar-02-2008] 10,15c10,13 < Reference for composition-based statistics: < Schaffer, Alejandro A., L. Aravind, Thomas L. Madden, < Sergei Shavirin, John L. Spouge, Yuri I. Wolf, < Eugene V. Koonin, and Stephen F. Altschul (2001), < "Improving the accuracy of PSI-BLAST protein database searches with < composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. --- > Reference for compositional score matrix adjustment: Altschul, Stephen F., > John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis, > Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches > using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109. This reference change doesn't seem to cause a problem on my machine. I didn't notice anything else worth commenting about. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 15:02:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:02:10 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231502.m4NF2AZm003440@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #9 from ibdeno at gmail.com 2008-05-23 11:02 EST ------- Hi Peter, Thank you for your patience and sorry not to be clear. 1. By 'choke' I meant that it produced the same error mentioned in the original but report. 2. I see now that my attachments (#922) were not appropriate: to gain some time I had requested no iterations to blastpgp, that is: I used '-j 1'. I can actually parse the plain text from 2.2.18 that I had submitted in those attachments both with your and my code. This also explains the differences in the headers... I will now submit two plain text outputs from blastpgp with 2 iterations ('-j 3') Your code and mine can parse 2.2.15 but both fail (with the "Incorrect header ?" error) with 2.2.18 Sorry again... Miguel (In reply to comment #8) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 15:05:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:05:16 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231505.m4NF5G1k003638@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #10 from ibdeno at gmail.com 2008-05-23 11:05 EST ------- Created an attachment (id=923) --> (http://bugzilla.open-bio.org/attachment.cgi?id=923&action=view) Plain text outputs from blastpgp versions 2.2.15 and 2.2.18 with 2 iterations These files are the result of calling blastpgp with the -j 3 option. The files sent with attachment #922 were actually no problematic, only when at least one iteration is carried out the parsing problem appears with blastpgp version 2.2.18. Perhaps due to the insertion of a new Reference in the header of the blastpgp output? Cheers, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 15:16:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:16:14 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231516.m4NFGExh004121@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 11:16 EST ------- Great - I now get the same error as you :) I'll try and have a look at this over the weekend. Would you be able to make matching XML files as well? While I'm playing with blastpgp output it would be worth checking exactly what the XML files do... P.S. Would you object to me using any of your examples as test cases for the Biopython unit tests? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 15:25:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:25:19 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231525.m4NFPJVY004581@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-23 11:25 EST ------- You are right - it is the extra reference which was causing the failure. I've checked in a fix to Bio/Blast/NCBIStandalone.py with CVS revision 1.72 Could you update your Biopython installation to CVS and retest? Or just replace /home/mortiz/Progs//lib/python/Bio/Blast/NCBIStandalone.py with revision 1.72 from the ViewCVS website once its updated: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython (I haven't closed this bug yet - I'd like your confirmation that this fixes things, adding a new test case would probably be wise.) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 15:39:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:39:49 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231539.m4NFdn83005197@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #13 from ibdeno at gmail.com 2008-05-23 11:39 EST ------- Created an attachment (id=924) --> (http://bugzilla.open-bio.org/attachment.cgi?id=924&action=view) XML equivalent of the files in the previous attachment (#923) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 15:41:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:41:17 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231541.m4NFfHv7005278@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #14 from ibdeno at gmail.com 2008-05-23 11:41 EST ------- I have now submitted the XML equivalent files. Sure, please use the examples and code if you find them useful. Cheers, Miguel (In reply to comment #11) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 15:42:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:42:50 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231542.m4NFgolb005350@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #15 from ibdeno at gmail.com 2008-05-23 11:42 EST ------- I will try as soon as revision 1.72 is available through the link you provided. So far, the latest is 1.71 Thank you! Miguel (In reply to comment #12) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri May 23 15:56:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 May 2008 11:56:13 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805231556.m4NFuDpd005873@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #16 from ibdeno at gmail.com 2008-05-23 11:56 EST ------- Sorry, I won't be able to try your fix until next week: I don't have access to the computer due to maintenance. I'll let you know as soon as possible. Miguel (In reply to comment #15) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat May 24 07:16:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 24 May 2008 03:16:44 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805240716.m4O7GiqV007275@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #17 from ibdeno at gmail.com 2008-05-24 03:16 EST ------- I have managed to access to a different computer and tested your revised (1.72) version of NCBIStandalone.py I'm glad I can confirm it does work. I guess the best way to avoid such problems in future would be to have an appropriate XML parser for PSI-Blast. Thank you very much for your assistance. (In reply to comment #12) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From peter at maubp.freeserve.co.uk Sat May 24 11:02:51 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Sat, 24 May 2008 12:02:51 +0100 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <200805231310.13408.jblanca@btc.upv.es> References: <200805220930.53004.jblanca@btc.upv.es> <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> <200805231310.13408.jblanca@btc.upv.es> Message-ID: <320fb6e00805240402t68094be2v4cbad1414f3e21b9@mail.gmail.com> Hi Jose, Your ideas are interesting for switching the MutableSeq class from an array of char internally to a python mutable string. However, are you talking about the UserString.MutableString object? The documentation suggests its not going to be as fast as a list or a character array: http://pydoc.org/2.5.1/UserString.html#MutableString Note that at some point we will be moving from Numeric to numpy, so the exact internals of the current array based MutableSeq will change slightly then. I will be away most of next week, so don't worry if I seem to be ignoring you ;) Peter From bugzilla-daemon at portal.open-bio.org Sat May 24 12:10:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 24 May 2008 08:10:24 -0400 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200805241210.m4OCAOol018283@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-24 08:10 EST ------- See also http://www.bioperl.org/wiki/Qual_sequence_format where there is a similar looking file format which they call "qual" described as also being used by PHRAP and CAP3. e.g. >HSMETOO 134bp 10 20 30 40 50 50 50 50 50 20 25 25 30 30 20 15 20 35 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 20 30 20 10 10 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat May 24 12:15:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 24 May 2008 08:15:23 -0400 Subject: [Biopython-dev] [Bug 2503] New: An error when parsing NCBIWWW Blast output Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2503 Summary: An error when parsing NCBIWWW Blast output Product: Biopython Version: Not Applicable Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: hebbar.prashanth at gmail.com Hi All, I get following error when I start parsing NCBIWWW balst output. Traceback (most recent call last): File "", line 1, in -toplevel- b_record = b_parser.parse(blast_results) File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 43, in parse self._scanner.feed(handle, self._consumer) File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 94, in feed has_re=re.compile(r'.?BLAST')) File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 335, in read_and_call_until line = safe_readline(uhandle) File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 411, in safe_readline raise SyntaxError, "Unexpected end of stream." SyntaxError: Unexpected end of stream. Can any one please help me to solve this? I am using biopython 1.44 version (I tried with 1.45 too, the same error comes) in windows system Thank you in anticipation, Prashanth -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat May 24 12:25:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 24 May 2008 08:25:59 -0400 Subject: [Biopython-dev] [Bug 2503] An error when parsing NCBIWWW Blast output In-Reply-To: Message-ID: <200805241225.m4OCPxTc018893@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2503 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-24 08:25 EST ------- We need more information. Could you show us the example code that causes this problem? If you are trying to parse a file (e.g. from standalone blast), could attach it to this bug? For the look of the stack trace, you are trying to parse the HTML output from blast (?). We do recommend parsing the XML output if possible (not the plain text or HTML output). Thank you, Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat May 24 14:26:27 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 24 May 2008 07:26:27 -0700 (PDT) Subject: [Biopython-dev] Bio.Entrez & Bio.EUtils In-Reply-To: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> Message-ID: <893127.27535.qm@web62412.mail.re1.yahoo.com> Dear all, I have essentially completed the parser in Bio.Entrez. AFAICT, it works with all kinds of XML files returned by NCBI's Entrez Utilities, except for the Pubmed Central database (Pubmed itself is fine). I am using this module a lot for my own work, so it has received quite a lot of testing. As a case in point, there are 40 unit tests for the Bio.Entrez parser. These, by the way, can show you some examples of how to use this module. The documentation is now also updated. This module may at some point replace Bio.EUtils, so if you are using this module you might want to try Bio.Entrez to see if it covers everything Bio.EUtils covers. --Michiel Peter wrote:Bio.Entrez (Michiel). I see you've been very busy with the new simplified XML parsers (see bug 2488). This looks like a big improvement on the rather repetitive coding needed in the first draft. Are you still actively making further refinements? How many Entrez XML file formats are still needed? http://bugzilla.open-bio.org/show_bug.cgi?id=2488 From mjldehoon at yahoo.com Sat May 24 14:16:15 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 24 May 2008 07:16:15 -0700 (PDT) Subject: [Biopython-dev] sequence class proposal In-Reply-To: <320fb6e00805240402t68094be2v4cbad1414f3e21b9@mail.gmail.com> Message-ID: <135625.21242.qm@web62412.mail.re1.yahoo.com> Peter wrote: > Note that at some point we will be moving from Numeric to numpy, so > the exact internals of the current array based MutableSeq will change > slightly then. MutableSeq uses Python's array, not Numeric's array, so it should not be affected by moving from Numeric to numpy. --Michiel. From biopython at maubp.freeserve.co.uk Sun May 25 10:36:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 25 May 2008 11:36:14 +0100 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <1211479809.4835b70111c71@webmail.upv.es> References: <200805220930.53004.jblanca@btc.upv.es> <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> <1211479809.4835b70111c71@webmail.upv.es> Message-ID: <320fb6e00805250336u251dd2buae72397aa10374b0@mail.gmail.com> On May 22, 2008, Blanca Postigo Jose Miguel wrote: >> If I understood your terminology correctly, "qualities" is a list of >> scores, one for each letter in the sequence. > You're right. I'm sorry, I used them a lot and a reserved them a special place > in the API, my fault, I will remove it, only the sequence should have a > relevant place in the API, the rest should be stored as features. I've asked on the BioSQL mailing list about this sort of "per letter" annotation. Currently there is no mechanism to store this sort of thing in the schema. http://lists.open-bio.org/pipermail/biosql-l/2008-May/001269.html However, Hilmar did point out some relevant bits of BioPerl to have a look at: Hilmar Lapp wrote: > In BioPerl we have Bio::Seq::SeqWithQuality and the more generic > Bio::Seq::MetaI. The BioPerl SeqWithQuality sounds like what you were most interested in Jose, although the Meta-Interface may be of relevance too. Peter From biopython at maubp.freeserve.co.uk Sun May 25 12:06:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 25 May 2008 13:06:50 +0100 Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files In-Reply-To: <157512.3075.qm@web62408.mail.re1.yahoo.com> References: <157512.3075.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com> On Sun, May 18, 2008, Michiel de Hoon wrote: > Hi everybody, > > In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now > installed using a specialized install_data_biopython class. For Bio.Entrez, I am > using the package_data argument to the setup function instead. Does anybody > know why the install_data_biopython class was used? If there's no specific > reason, I'd prefer to use the package_data argument instead. I think I've found one reason not to - it doesn't seem to be supported in Python 2.3 as shown here: C:\TEMP\biopython_cvs\biopython_all\biopython>c:\Python23\python.exe setup.py install c:\Python23\lib\distutils\dist.py:227: UserWarning: Unknown distribution option: 'package_data' warnings.warn(msg) running install ... If I'd known this earlier, I would of course have said something. On the other hand, I may be the only person still using Biopython with python 2.3. Peter From tiagoantao at gmail.com Sun May 25 12:48:35 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sun, 25 May 2008 13:48:35 +0100 Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files In-Reply-To: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com> References: <157512.3075.qm@web62408.mail.re1.yahoo.com> <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com> Message-ID: <6d941f120805250548t357d6d0fwe36d5d1b39eaaa77@mail.gmail.com> > If I'd known this earlier, I would of course have said something. On > the other hand, I may be the only person still using Biopython with > python 2.3. What about doing a survey (or a web poll on the site) on the main list to know what python versions people are using? To have a sense of what should be supported/deprecated... From biopython at maubp.freeserve.co.uk Sun May 25 10:36:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 25 May 2008 11:36:14 +0100 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <1211479809.4835b70111c71@webmail.upv.es> References: <200805220930.53004.jblanca@btc.upv.es> <320fb6e00805220847s29cdd37yb0472f4fe5e81818@mail.gmail.com> <1211479809.4835b70111c71@webmail.upv.es> Message-ID: <320fb6e00805250336u251dd2buae72397aa10374b0@mail.gmail.com> On May 22, 2008, Blanca Postigo Jose Miguel wrote: >> If I understood your terminology correctly, "qualities" is a list of >> scores, one for each letter in the sequence. > You're right. I'm sorry, I used them a lot and a reserved them a special place > in the API, my fault, I will remove it, only the sequence should have a > relevant place in the API, the rest should be stored as features. I've asked on the BioSQL mailing list about this sort of "per letter" annotation. Currently there is no mechanism to store this sort of thing in the schema. http://lists.open-bio.org/pipermail/biosql-l/2008-May/001269.html However, Hilmar did point out some relevant bits of BioPerl to have a look at: Hilmar Lapp wrote: > In BioPerl we have Bio::Seq::SeqWithQuality and the more generic > Bio::Seq::MetaI. The BioPerl SeqWithQuality sounds like what you were most interested in Jose, although the Meta-Interface may be of relevance too. Peter From jblanca at btc.upv.es Mon May 26 05:24:30 2008 From: jblanca at btc.upv.es (Blanca Postigo Jose Miguel) Date: Mon, 26 May 2008 07:24:30 +0200 Subject: [Biopython-dev] sequence class proposal In-Reply-To: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com> References: <320fb6e00805251437n34362f0bm2a323cd1194afaa@mail.gmail.com> Message-ID: <1211779470.483a498e18e3e@webmail.upv.es> > One of your points seemed to be that the SeqRecord couldn't have a > __getitem__ and methods like reverse, complement, etc. I don't see > why it couldn't have these. Perhaps rather than introducing a whole > new class, enhancing the SeqRecord would be a better avenue. My main concern with SeqRecord is that is has a Seq, it we want a slice or a reverse we would do: my_seq = SeqRecord(Seq('ACTGTGAC')) myseq.seq[1:5] myseq.seq.reverse() If we add to SeqRecord residues annotations (like qualities) how could be reversed if we are calling directly to the .seq.reverse(). I don't know how could this work. my_seq = SeqRecord(Seq('ACTG'), Qual([10,20,30,40])) myseq.seq.reverse() It would create a non-valid sequence str(myseq.seq) -> 'GTCA' str(myseq.qual) -> [10,20,30,40] One possibility is to have methods like __getitem__ and in Seq, it would be like: my_seq.seq[1:3] my_seq[1:3] Just for testing I have done a RichSeq that is compatible with Seq and SeqRecord, but that's very confusing. Does this SeqRecord HAS or IS a sequence? It could work, but I feel that is wrong and it is easier to explain to the users that a new improved SeqRecord has been created (RichSeq) and that they should migrate to that. Another problem difficult to solve. If RichSeq is compatible with Seq as Michel wants to and I agree on that, how it could be compatible with SeqRecord. The parameters in their constructors are not compatible: SeqRecord(seq, ...) Seq(data, alphabet...) I would happily improve on RichSeq, but I don't know how to do it in a sane way. What do you think? > > Also, I do think we should bear in mind the BioSQL sequence > representation, which we currently expose in a SeqRecord/Seq like way. > I wouldn't want to lose this / have to completely re-write the > Biopython BioSQL code. I would look into that. Best regards, Jose Blanca > > Peter > > On Sun, May 25, 2008 at 9:12 PM, Blanca Postigo Jose Miguel > wrote: > > Dear biopythonistas: > > First of all my apologize for the MutableSeq reimplementation. I did it > just for > > the sake of learning more about python and Biopython, not to achive a > speedier > > implementation. It has been a good learning exercise for me, but now let's > go > > for the meat... > > > > Everything that follows is just my opinion on the sequence classes. Mine is > not > > a well informed opinion and I would just like to show my ideas to you to > get > > some feed back and to learn from you. > > > > Since this sequence class remodelation is a complex topic I would like to > > explain my ideas about it with some order. I won't enter into > implementation > > details, I will just discuss the API of the classes. > > I think that Seq and MutableSeq are pretty ok, although MutableSeq has some > > extra method that depends on implementation and are not relevant for a > sequence > > class (append, insert, pop, remove). In general Seq and MutableSeq should > have > > the same API, that would do their use simpler. > > > > I think that the main problem is SeqRecord. SeqRecord IS NOT a sequence it > HAS a > > sequence, that's its main flaw. A more capable Seq class should be a Seq. > My > > proposal is to create a RichSeq that inherits from Seq and a MutableRichSeq > > that inherits from MutableSeq. I've been doing some coding and some > thinking > > about that. I'm discussing this with you, because I would like to improve > the > > desing of the API of such sequence and I could implement it. It's main > desing > > guidelines would be: > > - Compatible with Seq or with MutableSeq. Everytime that you can use a Seq > class > > you can also use a more capable RichSeq without changing anything in your > > program. > > - RichSeq IS a Seq, it inherits from Seq. > > - RichSeq is similar to SeqRecord, but they aren't compatible. > > The SeqRecord constructor is: > > def __init__(self, seq, id = "", name = "", > > description = "", dbxrefs = None, > > features = None): > > and the RichSeq one maybe: > > def __init__(self, seq=None, alphabet = None, > > id = "", name = "", > > description = "", dbxrefs = None, > > features = None): > > RichSeq has a seq(or could be data) and an alphabet (like the Seq > class) while > > SeqRecord has a Seq object. > > RichSeq would not have a .seq property. > > - RichSeq has a __getitem__ method capable of things like RichSeq[1:2]. And > it > > would also had the methods reverse, complement, etc.. That's not possible > with > > SeqRecord. > > - RichSeq should be a new type class, what about Seq and MutableSeq? > > - From a Michel's comment: > > 1) A Seq object is basically a string, so it should behave as if it > were > > subclassed from string. > > 2) As a result, functions that have a sequence as an argument, but > don't need > > the added features of a Seq object, should work with strings as well > as Seq > > objects. > > 4) Currently, Seq objects have an associated alphabet; SeqRecord > objects have > > annotations, dbxrefs, a description, features, id, and name. I think > a new Seq > > object should have both, so that we can avoid having both a Seq and > a SeqRecord > > class. Of course, some or all of these fields can remain None. (I > would add, > > that even the seq could be None) > > If biopython had a class like RichSeq I wouldn't use SeqRecord. Also, the > > transition from using SeqRecord to RichSeq would be very easy and both > classes > > could coexist as long as you would like. > > Also using the features the per-residue annotation is very easy to > implement. In > > fact I have done it already using a RichFeature class, but I would discuss > that > > in other mail. > > RichSeq is more easy to extend than SeqRecord, that's its main advantage. I > have > > pretty wild plans for a class like RichSeq. A class like SeqWithQuality or > the > > Bio::Seq::MetaI from Bioperl would be very easy to derive from RichSeq. The > > would be just easier interfaces to the more capable and general RichSeq. > Even > > Alignment would be derived from RichSeq. An Alignment IS a sequence with > > subsequences in it. I have also implemented a prototype of that and it work > > quite ok with very like coding. > > This are the more general remarks about RichSeq. What do you think? Is a > good > > idea to go beyond SeqRecord for biopython? Could be something like RichSeq > a > > possible way to do it? > > > > Now I would like to list the open discussion points regarding the sequence > class > > APIs. > > - annotations is not in the constructor of SeqRecord. There's two options: > add > > it to the RichSeq constructor or remove it altogether. In my implementation > a > > feature can span the whole sequence length or can have a range attached. In > > this way annotations are just a special case of featues. We would have to > > decide between dict and list for the API. > > > > - __getitem__ should always return a RichSeq. It's more consistent to > return the > > same for a_seq[1:2] and a_seq[1]. If someone wants a character can do > > str(seq)[1]. > > > > - no seq property in RichSeq. > > > > - with __str__ is enough, so tostring() is not necessary for more complex > > representations we have __repr__. tostring()could be kept for compatibility > > with the Seq and MutableSeq API. > > > > - What to do with id, name and the str annotations when a slice is > requested? If > > seq.name is 'a_sequence' should seq[1:10].name be 'a_sequence' or > 'a_sequence > > [1:10]' or ''? Same problem with add and __radd__.This is a problem, but > some > > of the three alternatives should be taken and explained in the > documetation. A > > better solution is in my RichFeature class, but I wouldn't discuss it now. > > > > - __iter__ iterates over the sequence as a character string. > > > > - __add__ and __radd__ > > > > - .upper(), .count(), .lower() > > > > - .data property. I think that this is an implemetation detail and it > should be > > deprecated from Seq and MutableSeq. > > > > Well, that's all sorry for the long mail. I'm enjoing working on this > problem > > and learning from you. > > Best regards, > > > > Jose Blanca > > > > > -- From bugzilla-daemon at portal.open-bio.org Wed May 28 12:17:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 May 2008 08:17:25 -0400 Subject: [Biopython-dev] [Bug 2506] New: SELECT problems on _get_seqfeature_dbxref in Loader.py with postgresql Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2506 Summary: SELECT problems on _get_seqfeature_dbxref in Loader.py with postgresql Product: Biopython Version: 1.45 Platform: PC OS/Version: Linux Status: NEW Severity: blocker Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: andrea at biodec.com CC: andrea at biodec.com Using: - postgres 8.3 or less # the version is not important - BioSQL 1.0.0 installed on a postgresql database (on Linux) # the version is not important - python-psycopg 1.1.21-14 or less - python-psycopg2 2.0.5.1-6 or less - python 2.4.4-2 # not important - Biopython CVS version 28/05/08, - Loader.py version 1.30 - "psycopg" or "psycopg2" as BioSeqDatabase.open_database drivers During insertion in the BioSQL database of a seq_record object derived from a GenBank Iterator, the procedure _get_seqfeature_dbxref fails with the errror: Traceback (most recent call last): File "", line 1, in ? File "/var/lib/python-support/python2.4/BioSQL/BioSeqDatabase.py", line 420, in load db_loader.load_seqrecord(cur_record) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 50, in load_seqrecord self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 542, in _load_seqfeature self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 641, in _load_seqfeature_qualifiers seqfeature_id) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 679, in _load_seqfeature_dbxref self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1) File "/var/lib/python-support/python2.4/BioSQL/Loader.py", line 712, in _get_seqfeature_dbxref result = self.adaptor.execute_and_fetch_col0(sql, (seqfeature_id, File "/var/lib/python-support/python2.4/BioSQL/BioSeqDatabase.py", line 295, in execute_and_fetch_col0 self.cursor.execute(sql, args or ()) psycopg.ProgrammingError: ERROR: column "195" does not exist SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id = "195" AND dbxref_id = "207739" The problem is that there is an error in the query format at rows 710 and 711 of the Loader.py in Biopyton/BioSQL: 709 # Check for an existing record 710 sql = r'SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref ' \ 711 r'WHERE seqfeature_id = "%s" AND dbxref_id = "%s"' because the query has double quotes (") around the values, and postgres interprets them as Column names and not values. If you correct the query with single quotes, you correct the error. 709 # Check for an existing record 710 sql = r"SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref " \ 711 r"WHERE seqfeature_id = '%s' AND dbxref_id = '%s'" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Wed May 28 12:31:33 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 28 May 2008 05:31:33 -0700 (PDT) Subject: [Biopython-dev] Bio.PopGen, Bio.EUtils data files In-Reply-To: <320fb6e00805250506w1d6fd1bbgf1d364d2ad949376@mail.gmail.com> Message-ID: <499799.68733.qm@web62408.mail.re1.yahoo.com> That's odd ... I had tried with a Python version 2.3, and it worked there. Maybe this feature was added during the Python 2.3 cycle. Then, I guess we need to use the install_data_biopython class for now, and start using package_data once we stop supporting Python 2.3. --Michiel Peter wrote: On Sun, May 18, 2008, Michiel de Hoon wrote: > Hi everybody, > > In setup.py, data files needed by Bio.EUtils and Bio.PopGen.SimCoal are now > installed using a specialized install_data_biopython class. For Bio.Entrez, I am > using the package_data argument to the setup function instead. Does anybody > know why the install_data_biopython class was used? If there's no specific > reason, I'd prefer to use the package_data argument instead. I think I've found one reason not to - it doesn't seem to be supported in Python 2.3 as shown here: C:\TEMP\biopython_cvs\biopython_all\biopython>c:\Python23\python.exe setup.py install c:\Python23\lib\distutils\dist.py:227: UserWarning: Unknown distribution option: 'package_data' warnings.warn(msg) running install ... If I'd known this earlier, I would of course have said something. On the other hand, I may be the only person still using Biopython with python 2.3. Peter From fkauff at biologie.uni-kl.de Thu May 29 09:20:56 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Thu, 29 May 2008 11:20:56 +0200 Subject: [Biopython-dev] CVS access and developers web site Message-ID: <483E7578.50402@biologie.uni-kl.de> Hi folks, although I've been quiet for a while, I'm still doing some changes to the Nexus parser of biopython from time to time.... I totally lost my passwords to access the repository. Could someone please send me a new password to get write access to cvs? And I would also like to change the information on the biopython developers web site, as they are somewhat outdated. And is this the right place to ask for such things? Thanks! Frank From bugzilla-daemon at portal.open-bio.org Thu May 29 10:47:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 May 2008 06:47:29 -0400 Subject: [Biopython-dev] [Bug 2506] SELECT problems on _get_seqfeature_dbxref in Loader.py with postgresql In-Reply-To: Message-ID: <200805291047.m4TAlT18002239@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2506 ------- Comment #1 from andrea at biodec.com 2008-05-29 06:47 EST ------- Created an attachment (id=926) --> (http://bugzilla.open-bio.org/attachment.cgi?id=926&action=view) Proposed patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Thu May 29 21:46:46 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 29 May 2008 22:46:46 +0100 Subject: [Biopython-dev] CVS access and developers web site In-Reply-To: <483E7578.50402@biologie.uni-kl.de> References: <483E7578.50402@biologie.uni-kl.de> Message-ID: <320fb6e00805291446x1cebf67bpe3e0818af5b9a7c5@mail.gmail.com> Hi Frank, I would try emailing support at helpdesk.open-bio.org using the email address associated with your CVS username. If you've changed email address, and you run into problems, I expect Michiel or I could vouch for you. For the website, the wiki usernames are entirely separate and you should be able to create a new account if you don't have one already. If you want to update the tutorial new HTML and PDF files are loaded with each release from the version in CVS. Peter On Thu, May 29, 2008 at 10:20 AM, Frank Kauff wrote: > Hi folks, > > although I've been quiet for a while, I'm still doing some changes to the > Nexus parser of biopython from time to time.... I totally lost my passwords > to access the repository. Could someone please send me a new password to get > write access to cvs? And I would also like to change the information on the > biopython developers web site, as they are somewhat outdated. > And is this the right place to ask for such things? > > Thanks! > > Frank From bugzilla-daemon at portal.open-bio.org Fri May 30 11:15:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 May 2008 07:15:23 -0400 Subject: [Biopython-dev] [Bug 2506] SELECT problems on _get_seqfeature_dbxref in Loader.py with postgresql In-Reply-To: Message-ID: <200805301115.m4UBFNE3011942@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2506 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-30 07:15 EST ------- Thanks for the report. I've fixed this issue (method _get_seqfeature_dbxref at line 710) and a similar one (in _get_bioentry_dbxref at line 761) in CVS BioSQL/Loader.py revision 1.31 Note that I have only tested this with MySQL under Linux. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri May 30 14:17:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 30 May 2008 15:17:08 +0100 Subject: [Biopython-dev] Bio.Entrez & Bio.EUtil In-Reply-To: <893127.27535.qm@web62412.mail.re1.yahoo.com> References: <320fb6e00805210221s93d411cpe7480b01c99540a8@mail.gmail.com> <893127.27535.qm@web62412.mail.re1.yahoo.com> Message-ID: <320fb6e00805300717v60f0b153i88b5e9a8aee1744c@mail.gmail.com> On 24 May 2008, Michiel de Hoon wrote: > Dear all, > > I have essentially completed the parser in Bio.Entrez. The internals of the new design look more complicated to start with, but I can see how much more general it is than the older versions :) Should it work starting from an empty DTDs folder - or will we ship Biopython with most of the current files? I've had trouble with Biopython trying to fetch missing DTD files from the internet. I think the problem is the NCBI using relative URLs. The following quick hack seems to help in Parser.py but only in some cases (because as listed below, the NCBI have two different base paths): 279,280c279,288 < warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) < handle = urllib.urlopen(systemId) --- > warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % path) > if "/" in systemId : > #Assume this is a full path, e.g. > #http://www.ncbi.nlm.nih.gov/entrez/query/DTD/nlmmedline_080101.dtd > handle = urllib.urlopen(systemId) > else : > #Its a relative path, and I'm not sure how to best get the base path: > handle = urllib.urlopen("http://www.ncbi.nlm.nih.gov/entrez/query/DTD/"+systemId) (Also note there seem to be some tab/space isssues in this file). >From http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ I've downloaded the following files using wget: egquery.dtd eSearch_020511.dtd nlmcommon_080101.dtd pubmed_080101.dtd eInfo_020511.dtd eSpell.dtd nlmmedline_080101.dtd taxon.dtd eLink_020511.dtd eSummary_041029.dtd nlmmedlinecitation_080101.dtd uilist.dtd ePost_020511.dtd nlmsharedcatcit_080101.dtd Additionally http://www.ncbi.nlm.nih.gov/dtd/ provided some further XML files needed for the test_Entrez.py unit test: NCBI_GBSeq.dtd NCBI_GBSeq.mod.dtd NCBI_Entity.mod.dtd NCBI_Mim.dtd NCBI_Mim.mod.dtd With all the above files, then the unit test file test_Entrez.py doesn't give any missing DTD warnings - but still has a couple of failures. Peter From bugzilla-daemon at portal.open-bio.org Fri May 30 15:15:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 May 2008 11:15:16 -0400 Subject: [Biopython-dev] [Bug 2502] PSIBlastParser fails with blastpgp 2.2.18 though works with blastpgp 2.2.15 In-Reply-To: Message-ID: <200805301515.m4UFFGhJ024631@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2502 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-05-30 11:15 EST ------- The XML parser seems to be fine on your example output. However, the XML output does not appear to list/flag any difference between: "Sequences used in model and found again" "Sequences not found previously or not previously below threshold" This means there is no way to populate the .new_seqs and .reused_seqs lists. If you care about this information, then for now using the plain text output might be best. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.