From ytu888 at hotmail.com Mon Oct 1 07:39:50 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 1 Oct 2007 06:39:50 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: Thanks Peter, However, I still haven't install mxText module in my Mac yet. Also could you tell me how to run the test file of ReportLab, when I launch Python and then import the test file into the python. Thanks. > Date: Fri, 28 Sep 2007 20:42:31 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com > CC: biopython at lists.open-bio.org > Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X > > Y Tu wrote: > > Thank you, Peter for the prompt answer. > > > > I did install the PIL already and tested with the commands "from PIL > > import Image", then "import _imaging". Both commands succeeded. > > That's why I don't understand why the test won't work. I used the > > command "python test_pdfgen_general.py" under the shell prompt, which > > generated the error. Since I installed PIL and succeeded in importing > > the module of PIL, I thought maybe I can solve the problem by running > > the test under Python. > > Looking in more detail at the original stack trace, > > > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load > > d = Image._getdecoder(self.mode, d, a, self.decoderconfig) > > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder > > raise IOError("decoder %s not available" % decoder_name) > > IOError: decoder jpeg not available > > Its possible that PIL needs some optional JPEG library, which ReportLab > wants to use. I suggest you search the ReportLab website & user's > mailing list, and if you can't work out what is wrong sign up to their > mailing list and ask them, http://www.reportlab.org/ > > Very little of Biopython needs ReportLab, you should be able to install > Biopython without it. > > Peter > > _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From ytu888 at hotmail.com Mon Oct 1 13:54:00 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 1 Oct 2007 12:54:00 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and installed it. Then I tried to install MySQL-python-1.2.2 but got the following error. How to create the mysql_config.path file? Thank you very much. leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ python setup.py build sh: line 1: mysql_config: command not found Traceback (most recent call last): File "setup.py", line 16, in metadata, options = get_config() File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 43, in get_config libs = mysql_config("libs_r") File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 24, in mysql_config raise EnvironmentError, "%s not found" % mysql_config.path EnvironmentError: mysql_config not found _________________________________________________________________ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx From lists.steve at arachnedesign.net Mon Oct 1 16:18:04 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Mon, 1 Oct 2007 16:18:04 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and > installed it. Then I tried to install MySQL-python-1.2.2 but got > the following error. How to create the mysql_config.path file? > Thank you very much. > > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ > python setup.py build > sh: line 1: mysql_config: command not found It seems as if you need to have the `mysql_config` command in your PATH variable and it's not there. Look for where mysql was installed (maybe /usr/local/mysql/...) and add its bin directory to your PATH environment variable. Or maybe it installed some binaries/symlinks into your /usr/local/bin directory? I think that'll do it for you. -steve From biopython at maubp.freeserve.co.uk Mon Oct 1 17:06:37 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 1 Oct 2007 22:06:37 +0100 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com> On 10/1/07, Y Tu wrote: > > Thanks Peter, > > However, I still haven't install mxText module in my Mac yet. I see you've signed up to the eGenix mailing list - I hope they can solve your mxTextTools installation problems. > Also could you tell me how to run the test file of ReportLab, when I > launch Python and then import the test file into the python. Thanks. In general I think most tests are designed to be run from the command line, not by running python, typing an import statement, and typing another command. You should check the ReportLab documentation to see what they recommend. To run a specific Biopython unit test, such as the general graphics unit test, you would do this: python run_tests.py test_GraphicsGeneral.py That would run the test, and check the output matched the expected results. Alternatively, you can do: python test_GraphicsGeneral.py I hope that helps. Peter From ULNJUJERYDIX at spammotel.com Tue Oct 2 02:52:53 2007 From: ULNJUJERYDIX at spammotel.com (Kevin Lam) Date: Tue, 2 Oct 2007 14:52:53 +0800 Subject: [BioPython] Fwd: **Fwd: [Bioperl-l] divide and blast blastunsplit blast subsequence In-Reply-To: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Message-ID: <5b6410e0710012352s520b537bj7374dd874dc93104@mail.gmail.com> Hi! I am trying to annotate a 200kb sequence by doing blastx to find the protein seq location I need to split the sequence up so that I get the best hits for each region (the top blast hits will mask the smaller proteins if i do it as a whole sequence) if i were to do it manually i can set the subsequence in the web gui for ncbi's blast. this way, the blast hits coords are based on the whole 200kb. but I can't find this option in blast or a straightforward way to do it in bioperl. I found similar solutions like http://www.bio.davidson.edu/projects/DAB/DAB.html divide and blast (but I want to specify coords rather than fixed intervals) there also this from the bioperl archives http://bioinformatics.org/pipermail/bioclusters/2002-August/000375.html but isn't there an easier way like i can specify blast subsequence 200-900 of fasta file and it will return the blastx hits in coords in terms of the whole 200kb? From mdehoon at c2b2.columbia.edu Tue Oct 2 05:06:54 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 05:06:54 -0400 Subject: [BioPython] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Hi everybody, Since no users of Bio.MultiProc came forward, I deprecated it for the upcoming release. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon Sent: Tue 9/11/2007 10:37 AM To: BioPython Developers List; biopython at biopython.org Subject: [BioPython] Bio.MultiProc Hi everybody, In preparation for the upcoming release, I was running the Biopython test suite and found that test_copen.py hangs on Cygwin. It doesn't fail, it just sits there forever. This may be related to the use of fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it is probably possible to fix this, I'd have to dig fairly deep into the code, and I am not sure if it is worth it. It looks like the copen functions are used only in Bio/config, which is needed for Bio.db. A description of the functionality of thia module can be found in the tutorial section 4.7.2. Now, I don't remember users asking about this module on the mailing list. From the tutorial documentation, it seems to be a nice piece of code, but I doubt that it is being used often in practice. So I was wondering: 1) Is anybody on this list using this code? 2) If not, can I mark it as deprecated for the upcoming release? Hopefully, people who are using this code will notice, and let us know that they need it. --Michiel. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From ytu888 at hotmail.com Tue Oct 2 07:36:58 2007 From: ytu888 at hotmail.com (Y Tu) Date: Tue, 2 Oct 2007 06:36:58 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com> Message-ID: Thank you very much, Peter. > Date: Mon, 1 Oct 2007 22:06:37 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com > Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X > CC: biopython at lists.open-bio.org > > On 10/1/07, Y Tu wrote: > > > > Thanks Peter, > > > > However, I still haven't install mxText module in my Mac yet. > > I see you've signed up to the eGenix mailing list - I hope they can > solve your mxTextTools installation problems. > > > Also could you tell me how to run the test file of ReportLab, when I > > launch Python and then import the test file into the python. Thanks. > > In general I think most tests are designed to be run from the command > line, not by running python, typing an import statement, and typing > another command. You should check the ReportLab documentation to see > what they recommend. > > To run a specific Biopython unit test, such as the general graphics > unit test, you would do this: > > python run_tests.py test_GraphicsGeneral.py > > That would run the test, and check the output matched the expected > results. Alternatively, you can do: > > python test_GraphicsGeneral.py > > I hope that helps. > > Peter _________________________________________________________________ Help yourself to FREE treats served up daily at the Messenger Caf?. Stop by today. http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline From ytu888 at hotmail.com Tue Oct 2 08:29:46 2007 From: ytu888 at hotmail.com (Y Tu) Date: Tue, 2 Oct 2007 07:29:46 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> Message-ID: Hi Steve, I checked the PATH and added /usr/local/mysql/bin into it. But I still got the same error message when running the setup.py. Thanks. > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 1 Oct 2007 16:18:04 -0400 > To: ytu888 at hotmail.com > > > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and > > installed it. Then I tried to install MySQL-python-1.2.2 but got > > the following error. How to create the mysql_config.path file? > > Thank you very much. > > > > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ > > python setup.py build > > sh: line 1: mysql_config: command not found > > It seems as if you need to have the `mysql_config` command in your > PATH variable and it's not there. > > Look for where mysql was installed (maybe /usr/local/mysql/...) and > add its bin directory to your PATH environment variable. Or maybe it > installed some binaries/symlinks into your /usr/local/bin directory? > > I think that'll do it for you. > > -steve > _________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us From idoerg at gmail.com Tue Oct 2 12:00:41 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Tue, 2 Oct 2007 09:00:41 -0700 Subject: [BioPython] [Biopython-dev] Bio.MultiProc In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> References: <46E6A845.3030601@c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From mdehoon at c2b2.columbia.edu Tue Oct 2 20:18:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 20:18:59 -0400 Subject: [BioPython] [Biopython-dev] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu><6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62D@mail2.exch.c2b2.columbia.edu> > Would it be possible to include the module, comment out the unworkable > source code and print a deprecation warning when it is imported? That is what I did. > 3) Leave an option of fixing and commenting the code back in (i.e. it is not > lost forever). Even after removing the code in some future release, the code will not be lost forever. It can always be retrieved from CVS and from older Biopython releases. > Also, is it possible to track down the original author? That would be Jeff Chang. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Iddo Friedberg [mailto:idoerg at gmail.com] Sent: Tue 10/2/2007 12:00 PM To: Michiel De Hoon Cc: BioPython Developers List; biopython at biopython.org Subject: Re: [Biopython-dev] [BioPython] Bio.MultiProc Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From ytu888 at hotmail.com Wed Oct 3 08:44:32 2007 From: ytu888 at hotmail.com (Y Tu) Date: Wed, 3 Oct 2007 07:44:32 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> Message-ID: Here is the copy of the output in the Terminal. Please help me to find out what's wrong. Thanks. Last login: Wed Oct 3 08:28:38 on ttyp4 Welcome to Darwin! LeesComputer:~ Lee$ echo $PATH /Library/Frameworks/Python.framework/Versions/Current/bin:/usr/local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin LeesComputer:~ Lee$ cd /applications/python_bio/MySQL-python-1.2.2 LeesComputer:/applications/python_bio/MySQL-python-1.2.2 Lee$ python setup.py build sh: line 1: mysql_config: command not found Traceback (most recent call last): File "setup.py", line 16, in metadata, options = get_config() File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 43, in get_config libs = mysql_config("libs_r") File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 24, in mysql_config raise EnvironmentError, "%s not found" % mysql_config.path EnvironmentError: mysql_config not found LeesComputer:/applications/python_bio/MySQL-python-1.2.2 Lee$ cd /usr/local LeesComputer:/usr/local Lee$ ls -al total 8 drwxr-xr-x 8 root wheel 272 Oct 1 13:02 . drwxr-xr-x 10 root wheel 340 Sep 26 11:30 .. drwxr-xr-x 8 root admin 272 Aug 6 04:00 ActivePerl-5.8 drwxr-xr-x 15 root wheel 510 Oct 2 03:52 bin drwxr-xr-x 6 root wheel 204 Sep 27 05:22 include drwxr-xr-x 12 root wheel 408 Sep 27 05:21 lib lrwxr-xr-x 1 root wheel 25 Oct 1 13:02 mysql -> mysql-5.0.45-osx10.4-i686 drwxr-xr-x 19 root wheel 646 Jul 4 13:54 mysql-5.0.45-osx10.4-i686 > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 1 Oct 2007 16:18:04 -0400 > To: ytu888 at hotmail.com > > > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and > > installed it. Then I tried to install MySQL-python-1.2.2 but got > > the following error. How to create the mysql_config.path file? > > Thank you very much. > > > > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ > > python setup.py build > > sh: line 1: mysql_config: command not found > > It seems as if you need to have the `mysql_config` command in your > PATH variable and it's not there. > > Look for where mysql was installed (maybe /usr/local/mysql/...) and > add its bin directory to your PATH environment variable. Or maybe it > installed some binaries/symlinks into your /usr/local/bin directory? > > I think that'll do it for you. > > -steve > _________________________________________________________________ Climb to the top of the charts!? Play Star Shuffle:? the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct From lists.steve at arachnedesign.net Wed Oct 3 09:01:09 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Wed, 3 Oct 2007 09:01:09 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> Message-ID: <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> Hi, On Oct 3, 2007, at 8:44 AM, Y Tu wrote: > Here is the copy of the output in the Terminal. Please help me to > find out what's wrong. Thanks. > > Last login: Wed Oct 3 08:28:38 on ttyp4 > Welcome to Darwin! > LeesComputer:~ Lee$ echo $PATH > /Library/Frameworks/Python.framework/Versions/Current/bin:/usr/ > local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin It still looks like your PATH is screwed up, /usr/local/mysql/bin isn't in there, you have: /usr/local/mysl:/bin Here's a test. Open up a terminal and type: $ which mysql_config If you don't get an answer back that indicates that the system can find the binary, then your script won't either. For instance, this is how it looks for me: $ which mysql_config /Library/MySQL/bin/mysql_config (I have an older version of mysql which was installed into /Library/ MySQL) Yours should say: $ which mysql_config /usr/local/mysql/bin/mysql_config Or something like that. Try that and see ... -steve From lists.steve at arachnedesign.net Wed Oct 3 10:47:41 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Wed, 3 Oct 2007 10:47:41 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> Message-ID: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> > Steve, thank you very much. It fixed the problem and I got through > the build and install step. But when I tested inside the python for > the installation I got following error. Please help me about it. > Thanks. > > >>> import MySQLdb > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ > site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > _mysql.py:3: UserWarning: Module _mysql was already imported from / > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, > but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to > sys.path > import sys, pkg_resources, imp > Traceback (most recent call last): > File "", line 1, in > File "MySQLdb/__init__.py", line 19, in > import _mysql > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in > > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in > __bootstrap__ > ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / > usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib > Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so > Reason: image not found Sorry, don't know exactly what's happening here. Is this from a "fresh" python prompt? How did you install MySQLdb, did you use easy_install? If so, try to install from the sourceforge download. Try to remove it, remove the "build" directory from your mysqldb download and redo the whole python setup.py build / python setup.py install process To remove it, nuke this: /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg And try to reinstall? Perhaps someone who knows what the problem is here can give you a better idea on what to do. -steve From sbassi at gmail.com Thu Oct 4 02:47:44 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 4 Oct 2007 03:47:44 -0300 Subject: [BioPython] Problem with blast xml Message-ID: I am having a problem that it is not originated in Biopython, but it is affecting the Biopython (1.43) xml blast parser. I have two xml files, one can be parsed and the other can't. Here are the commands I run to get the xml files: sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o TABB2v2.xml The relevant difference is the input file, the sequences are different, but the output file should have the same format (shouldn't it?). When I am parsing the files, I find that this is not true. This is the file that can be parsed without problem: >>> bout=open('bioinfo/INTA/TABB2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 31' >>> y.query u'fragment 67' >>> x.alignments [] >>> y.alignments [, , , , , , ] Let's see what seems to be a malformed? xml file: >>> bout=open('bioinfo/INTA/TABB2v2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 1' >>> y.query u'fragment 57' >>> x.alignments [] >>> y.alignments [] There is a record with an empty list. Here is a fragment of the "normal" one (TABB2.xml): 2 F 31 lcl|31_0 fragment 31 1174 1 gi|1788520|gb|AE000309.1|AE000309 Escherichia coli K-12 MG1655 section 199 of 400 of the complete genome AE000309 13453 1 Here is a fragment of the "malformed" one (TABB2v2.xml): 2 F 1 400 4662239 0 0 0.710603 1.37406 1.30725 57 Why is this happening? Is this a expected behavior? I uploaded the xml files here: http://www.bioinformatica.info/TABB2.xml http://www.bioinformatica.info/TABB2v2.xml -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From ytu888 at hotmail.com Thu Oct 4 08:24:18 2007 From: ytu888 at hotmail.com (Y Tu) Date: Thu, 4 Oct 2007 07:24:18 -0500 Subject: [BioPython] Error generated by Clustalw example in Tutorial Message-ID: Hi, I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks. >>> from Bio import Clustalw >>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta")) >>> cline.set_output("result.aln") >>> print cline clustalw .\opuntia.fasta -OUTFILE=result.aln >>> alignment = Clustalw.do_alignment(cline) Traceback (most recent call last): File "", line 1, in File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment % (out_file, command_line))IOError: Output .aln file result.aln not produced, commandline: clustalw .\opuntia.fasta -OUTFILE=result.aln _________________________________________________________________ Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033 From sbassi at gmail.com Thu Oct 4 12:19:22 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 4 Oct 2007 13:19:22 -0300 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: References: Message-ID: On 10/4/07, Y Tu wrote: > >>> print cline > clustalw .\opuntia.fasta -OUTFILE=result.aln I am not sure if this command is properly formated. The slash should not be there, but I don't have a windows box to try this. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Thu Oct 4 21:01:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 4 Oct 2007 21:01:59 -0400 Subject: [BioPython] Problem with blast xml References: Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> Can you create two minimal XML files that demonstrate the problem? For example, by removing records from the two files you have and checking if parsing still works for one and fails for the other. By doing so, you may be able to identify exactly what the essential difference between the two files is. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Sebastian Bassi Sent: Thu 10/4/2007 2:47 AM To: biopython at biopython.org Subject: [BioPython] Problem with blast xml I am having a problem that it is not originated in Biopython, but it is affecting the Biopython (1.43) xml blast parser. I have two xml files, one can be parsed and the other can't. Here are the commands I run to get the xml files: sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o TABB2v2.xml The relevant difference is the input file, the sequences are different, but the output file should have the same format (shouldn't it?). When I am parsing the files, I find that this is not true. This is the file that can be parsed without problem: >>> bout=open('bioinfo/INTA/TABB2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 31' >>> y.query u'fragment 67' >>> x.alignments [] >>> y.alignments [, , , , , , ] Let's see what seems to be a malformed? xml file: >>> bout=open('bioinfo/INTA/TABB2v2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 1' >>> y.query u'fragment 57' >>> x.alignments [] >>> y.alignments [] There is a record with an empty list. Here is a fragment of the "normal" one (TABB2.xml): 2 F 31 lcl|31_0 fragment 31 1174 1 gi|1788520|gb|AE000309.1|AE000309 Escherichia coli K-12 MG1655 section 199 of 400 of the complete genome AE000309 13453 1 Here is a fragment of the "malformed" one (TABB2v2.xml): 2 F 1 400 4662239 0 0 0.710603 1.37406 1.30725 57 Why is this happening? Is this a expected behavior? I uploaded the xml files here: http://www.bioinformatica.info/TABB2.xml http://www.bioinformatica.info/TABB2v2.xml -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From sbassi at gmail.com Fri Oct 5 01:39:44 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Fri, 5 Oct 2007 02:39:44 -0300 Subject: [BioPython] Problem with blast xml In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> Message-ID: On 10/4/07, Michiel De Hoon wrote: > Can you create two minimal XML files that demonstrate the problem? > For example, by removing records from the two files you have and checking if > parsing still works for one and fails for the other. > By doing so, you may be able to identify exactly what the essential > difference between the two files is. After some tests, I found two minimal XML files with this issue: http://www.bioinformatica.info/mitoA.xml http://www.bioinformatica.info/mitoB.xml (only 3.5 kb each). -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Fri Oct 5 02:34:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 5 Oct 2007 02:34:56 -0400 Subject: [BioPython] Problem with blast xml References: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B631@mail2.exch.c2b2.columbia.edu> >From looking at the XML files, it seems that the Biopython Blast XML parser is doing the right thing. Isn't it? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Sebastian Bassi [mailto:sbassi at gmail.com] Sent: Fri 10/5/2007 1:39 AM To: Michiel De Hoon Cc: biopython at biopython.org Subject: Re: [BioPython] Problem with blast xml On 10/4/07, Michiel De Hoon wrote: > Can you create two minimal XML files that demonstrate the problem? > For example, by removing records from the two files you have and checking if > parsing still works for one and fails for the other. > By doing so, you may be able to identify exactly what the essential > difference between the two files is. After some tests, I found two minimal XML files with this issue: http://www.bioinformatica.info/mitoA.xml http://www.bioinformatica.info/mitoB.xml (only 3.5 kb each). -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Fri Oct 5 05:26:06 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 05 Oct 2007 10:26:06 +0100 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: References: Message-ID: <4706032E.1020703@maubp.freeserve.co.uk> Y Tu wrote: > Hi, > > I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks. > >>>> from Bio import Clustalw >>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta")) >>>> cline.set_output("result.aln") >>>> print cline > clustalw .\opuntia.fasta -OUTFILE=result.aln The Windows version of ClustalW is very fussy. To experiment try running this by hand at the windows command prompt - note that I'm not at my Windows machine so I haven't double checked this: clustalw .\opuntia.fasta -OUTFILE=result.aln or, clustalw opuntia.fasta -OUTFILE=result.aln Any error messages would be helpful. I suggest you try this in Biopython: from Bio import Clustalw cline = Clustalw.MultipleAlignCL("opuntia.fasta") cline.set_output("result.aln") print cline Also, we have made a few tweaks to this code since Biopython 1.43 was released (see emails with Emanuel Hey in July 2007). If you like, you can try updating this module to the CVS version. Simply backup the existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and replace it with the latest code from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python Peter From ytu888 at hotmail.com Fri Oct 5 12:32:05 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 5 Oct 2007 11:32:05 -0500 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: <4706032E.1020703@maubp.freeserve.co.uk> References: <4706032E.1020703@maubp.freeserve.co.uk> Message-ID: I tested both commands under window prompt, initially both generated error because window don't know clustalw. Once I give the correct path of the clustalw, both generated alignment results without any error. BTW, I used the one inside BioEdit, I did not find clustalw coming with Biopython. It looks like python use online program at ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right? Then I replace the old _ini_with the new one, but there is a new error message similar to the old one: >>> alignment = Clustalw.do_alignment(cline) Traceback (most recent call last): File "", line 1, in File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment # check if the outfile exists before parsing IOError: Output .aln file result1.aln not produced, commandline: clustalw opuntia.fasta -OUTFILE=result1.aln Also I tested the example on OS X, the same error was generated: >>> alignment = Clustalw.do_alignment(cline) sh: line 1: clustalw: command not found Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 117, in do_alignment % (out_file, command_line)) IOError: Output .aln file result1.aln not produced, commandline: clustalw ./opuntia.fasta -OUTFILE=result1.aln It seems like the problem is not linked to OS. What other things could be wrong? Thanks. > Date: Fri, 5 Oct 2007 10:26:06 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com > CC: biopython at lists.open-bio.org > Subject: Re: [BioPython] Error generated by Clustalw example in Tutorial > > Y Tu wrote: > > Hi, > > > > I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks. > > > >>>> from Bio import Clustalw > >>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta")) > >>>> cline.set_output("result.aln") > >>>> print cline > > clustalw .\opuntia.fasta -OUTFILE=result.aln > > The Windows version of ClustalW is very fussy. To experiment try > running this by hand at the windows command prompt - note that I'm not > at my Windows machine so I haven't double checked this: > > clustalw .\opuntia.fasta -OUTFILE=result.aln > > or, > > clustalw opuntia.fasta -OUTFILE=result.aln > > Any error messages would be helpful. > > I suggest you try this in Biopython: > > from Bio import Clustalw > cline = Clustalw.MultipleAlignCL("opuntia.fasta") > cline.set_output("result.aln") > print cline > > Also, we have made a few tweaks to this code since Biopython 1.43 was > released (see emails with Emanuel Hey in July 2007). If you like, you > can try updating this module to the CVS version. Simply backup the > existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and > replace it with the latest code from here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python > > Peter > _________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us From biopython at maubp.freeserve.co.uk Fri Oct 5 14:35:05 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 05 Oct 2007 19:35:05 +0100 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: References: <4706032E.1020703@maubp.freeserve.co.uk> Message-ID: <470683D9.90808@maubp.freeserve.co.uk> Y Tu wrote: > I tested both commands under window prompt, initially both generated > error because window don't know clustalw. This is expected. You must either supply the full path of the clustalw executable, or have it on the system path. Otherwise Windows doesn't know how to find the clustalw program. > Once I give the correct path of the clustalw, both generated > alignment results without any error. BTW, I used the one inside > BioEdit, I did not find clustalw coming with Biopython. It looks like > python use online program at > ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right? Clustalw is a standalone program (completely separate from Biopython) which you must install separately if you want to use it. It is available from several servers - the one you chose looks fine. > Then I replace the old _ini_with the new one, but there is a new > error message similar to the old one: > >>>> alignment = Clustalw.do_alignment(cline) > Traceback (most recent call last): File "", line > 1, in File > "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, > in do_alignment # check if the outfile exists before parsing IOError: > Output .aln file result1.aln not produced, commandline: clustalw > opuntia.fasta -OUTFILE=result1.aln > > Also I tested the example on OS X, the same error was generated: > >>>> alignment = Clustalw.do_alignment(cline) > sh: line 1: clustalw: command not found Traceback (most recent call > last): File "", line 1, in File > "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", > line 117, in do_alignment % (out_file, command_line)) IOError: > Output .aln file result1.aln not produced, commandline: clustalw > ./opuntia.fasta -OUTFILE=result1.aln > > It seems like the problem is not linked to OS. What other things > could be wrong? Thanks. In both cases, you are not explicitly providing the path to clustalw - so for this to work the clustalw executable must be on the system path. The other obvious thing to check is the location of the files versus the working directory. Is your python script in the same folder as the opuntia.fasta file? What happens if you try those exact command lines (which Biopython says it is trying to run) at the command prompt in directory where your python script is located? i.e. Windows: clustalw opuntia.fasta -OUTFILE=result1.aln Mac: clustalw ./opuntia.fasta -OUTFILE=result1.aln Peter From meesters at uni-mainz.de Mon Oct 8 11:07:54 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 8 Oct 2007 17:07:54 +0200 Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures? Message-ID: <1191856074.5425.24.camel@cmeesters> Hi, I'm trying to 'split' a structure in several pieces, e.g. a former chain 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on. Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ... Perhaps some code explains better what I'm trying to achieve: breakpoints = [1254, 5444, 6690, 10888, 10889, 16332, 16333, 21776, 21776, 27220, 27221, 32665] def split_chain(structure, breakpoints, outname = 'split.pdb'): chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] chain = chains.pop(0) for atom in structure.get_atoms(): number = atom.get_serial_number() if breaks and number == breaks[0]: breaks.pop(0) chain = chains.pop(0) atom.parent.parent.id = chain # assign new chain iostream = PDBIO() try: outfile = open(outname, 'w') iostream.set_structure(structure.structure) iostream.save(outfile) except IOError, msg: raise IOError(msg) So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to 5444. Instead the written pdb-file contains all atoms, but with the wrong chain ids (see above). (Please don't tell my how unpythonic the code reads, point is that I've tried so many different things that I first need to understand my logic mistake.) Any ideas, where my mistake is? Thanks, Christian From meesters at uni-mainz.de Mon Oct 8 11:54:32 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 8 Oct 2007 17:54:32 +0200 Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures? In-Reply-To: <470A508C.4060803@maubp.freeserve.co.uk> References: <1191856074.5425.24.camel@cmeesters> <470A508C.4060803@maubp.freeserve.co.uk> Message-ID: <1191858872.5425.32.camel@cmeesters> > > breakpoints = [1254, 5444, > > 6690, 10888, > > 10889, 16332, > > 16333, 21776, > > 21776, 27220, > > 27221, 32665] > > I'm assuming this is "breaks" later on. Absolutely - that's the pain with copy & paste for demos ... sorry. > As the reason, I think this is what is happening: Given an atom, then > atom.parent will be a residue object, and atom.parent.parent will be a > chain object. Note all the atoms in a single amino acid residue will > share share the same .parent, and all the atoms in a single chain will > share the same .parent.parent > > i.e. You have renamed Chain "A" to "A", and then later renamed this > chain to "B", and then again to "C". You didn't ever split up the chain > into sub chains. Mh, makes sense. > > To be honest, I would be tempted to write a quick and dirty script which > parsed the raw PDB file, and rewrote the chain field based on the atom > sequence number - without the overhead of the PDB parser. Yes, would have been too easy ;-). Only wanted to add this functionality to a larger application and make it easy to use. There is no strict need to do so, but it would have been nice. However, thanks for the input. Christian From biopython at maubp.freeserve.co.uk Mon Oct 8 11:45:16 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 08 Oct 2007 16:45:16 +0100 Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures? In-Reply-To: <1191856074.5425.24.camel@cmeesters> References: <1191856074.5425.24.camel@cmeesters> Message-ID: <470A508C.4060803@maubp.freeserve.co.uk> Christian Meesters wrote: > Hi, > > I'm trying to 'split' a structure in several pieces, e.g. a former chain > 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on. > Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ... > > Perhaps some code explains better what I'm trying to achieve: > > breakpoints = [1254, 5444, > 6690, 10888, > 10889, 16332, > 16333, 21776, > 21776, 27220, > 27221, 32665] I'm assuming this is "breaks" later on. > def split_chain(structure, breakpoints, outname = 'split.pdb'): > chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', > 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', > 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', > 'X', 'Y', 'Z'] > > chain = chains.pop(0) > for atom in structure.get_atoms(): > number = atom.get_serial_number() > if breaks and number == breaks[0]: > breaks.pop(0) > chain = chains.pop(0) > atom.parent.parent.id = chain # assign new chain > > iostream = PDBIO() > try: > outfile = open(outname, 'w') > iostream.set_structure(structure.structure) > iostream.save(outfile) > except IOError, msg: > raise IOError(msg) > > So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to > 5444. Instead the written pdb-file contains all atoms, but with the > wrong chain ids (see above). (Please don't tell my how unpythonic the > code reads, point is that I've tried so many different things that I > first need to understand my logic mistake.) > > Any ideas, where my mistake is? As the reason, I think this is what is happening: Given an atom, then atom.parent will be a residue object, and atom.parent.parent will be a chain object. Note all the atoms in a single amino acid residue will share share the same .parent, and all the atoms in a single chain will share the same .parent.parent i.e. You have renamed Chain "A" to "A", and then later renamed this chain to "B", and then again to "C". You didn't ever split up the chain into sub chains. I think you need to create a new chain objects instead... but I'm not sure off hand how best to do this with Bio.PDB To be honest, I would be tempted to write a quick and dirty script which parsed the raw PDB file, and rewrote the chain field based on the atom sequence number - without the overhead of the PDB parser. Peter From bbrazelton at gmail.com Mon Oct 8 20:33:03 2007 From: bbrazelton at gmail.com (B. Brazelton) Date: Mon, 8 Oct 2007 17:33:03 -0700 Subject: [BioPython] BLAST XML parser trouble Message-ID: I tried to follow the BLAST XML parser example in the tutorial, but I always get the following error when attempting to iterate through the records: Traceback (most recent call last): File "BlastXML_Parser.py", line 10, in ? for blast_record in blast_records: File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py", line 572, in parse expat_parser.Parse(text, False) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py", line 98, in endElement eval("self.%s()" % method) File "", line 0, in ? File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py", line 215, in _end_BlastOutput_version self._header.version = self._value.split()[1] IndexError: list index out of range All I did was: result_handle = open('NifH_Blast.xml') from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: ... etc I put my script and xml file here: http://www.staff.washington.edu/braz/files I'm using biopython 1.43, and I get the same error on both Python 2.3.5 and Python 5. It seems like my commands are exactly what is in the tutorial, so I'm confused. My best guess is that there is a difference in the XML format, but it's NCBI XML. Thanks for any help, Bill Brazelton From sbassi at gmail.com Mon Oct 8 20:48:50 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 8 Oct 2007 21:48:50 -0300 Subject: [BioPython] BLAST XML parser trouble In-Reply-To: References: Message-ID: On 10/8/07, B. Brazelton wrote: > I tried to follow the BLAST XML parser example in the tutorial, but I > always get the following error when attempting to iterate through the > records: Got the same result as you. Could you please tell me the URL of the tutorial you saw this? -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Mon Oct 8 22:55:21 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon, 8 Oct 2007 22:55:21 -0400 Subject: [BioPython] BLAST XML parser trouble References: Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu> How did you produce the XML file? In particular, which Blast version did you use? The Blast XML parser trips over the following line in your XML file: unspecified This is supposed to be: BLASTP 2.2.12 [Aug-07-2005] , of course depending on which Blast version you are using. --Michiel Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton Sent: Mon 10/8/2007 8:33 PM To: biopython at biopython.org Subject: [BioPython] BLAST XML parser trouble I tried to follow the BLAST XML parser example in the tutorial, but I always get the following error when attempting to iterate through the records: Traceback (most recent call last): File "BlastXML_Parser.py", line 10, in ? for blast_record in blast_records: File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/Bio/Blast/NCBIXML.py", line 572, in parse expat_parser.Parse(text, False) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/Bio/Blast/NCBIXML.py", line 98, in endElement eval("self.%s()" % method) File "", line 0, in ? File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/Bio/Blast/NCBIXML.py", line 215, in _end_BlastOutput_version self._header.version = self._value.split()[1] IndexError: list index out of range All I did was: result_handle = open('NifH_Blast.xml') from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: ... etc I put my script and xml file here: http://www.staff.washington.edu/braz/files I'm using biopython 1.43, and I get the same error on both Python 2.3.5 and Python 5. It seems like my commands are exactly what is in the tutorial, so I'm confused. My best guess is that there is a difference in the XML format, but it's NCBI XML. Thanks for any help, Bill Brazelton _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From kbaa at novonordisk.com Tue Oct 9 08:26:14 2007 From: kbaa at novonordisk.com (KBAA (Kent Bondensgaard)) Date: Tue, 9 Oct 2007 14:26:14 +0200 Subject: [BioPython] FW: Parsing sequence information in patents Message-ID: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net> Does anyone know how to parse protein sequence information in patents with Biopython? BR, Kent Bondensgaards __________________________________ Kent Bondensgaard Research Scientist Protein Structure and Biophysics Novo Nordisk A/S Novo Nordisk Park DK-2760 M?l?v Denmark +45 4443 4510 (direct) +45 3075 4510 (mobile) +45 4466 3450 (fax) kbaa at novonordisk.com Changing the way we look at diabetes A new DAWN for people with diabetes? Click here to read more This e-mail (including any attachments) is intended for the addressee(s) stated above only and may contain confidential information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information contained herein is strictly prohibited and may violate rights to proprietary information. If you are not an intended recipient, please return this e-mail to the sender and delete it immediately hereafter. Thank you. From sbassi at gmail.com Tue Oct 9 09:04:51 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 9 Oct 2007 10:04:51 -0300 Subject: [BioPython] FW: Parsing sequence information in patents In-Reply-To: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net> References: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net> Message-ID: On 10/9/07, KBAA (Kent Bondensgaard) wrote: > > Does anyone know how to parse protein sequence information in patents with Biopython? What about using patAA and patNT from NCBI? They are both available as blast ready, you could retrieve the fasta file using fastacmd. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From bbrazelton at gmail.com Tue Oct 9 16:24:58 2007 From: bbrazelton at gmail.com (B. Brazelton) Date: Tue, 9 Oct 2007 13:24:58 -0700 Subject: [BioPython] BLAST XML parser trouble In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu> Message-ID: I put in 'tblastx 2.2.15 [Oct-15-2006]' and it worked fine. Thanks for your help, sorry for the newbie question. (FYI, I was using results generated from the CAMERA database (http://camera.calit2.net/), and I was using the main biopython tutorial and cookbook from biopython.org. thanks again, BB On 10/8/07, Michiel De Hoon wrote: > How did you produce the XML file? In particular, which Blast version did you > use? > The Blast XML parser trips over the following line in your XML file: > > unspecified > > This is supposed to be: > > BLASTP 2.2.12 [Aug-07-2005] > > , of course depending on which Blast version you are using. > > --Michiel > > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton > Sent: Mon 10/8/2007 8:33 PM > To: biopython at biopython.org > Subject: [BioPython] BLAST XML parser trouble > > I tried to follow the BLAST XML parser example in the tutorial, but I > always get the following error when attempting to iterate through the > records: > > Traceback (most recent call last): > File "BlastXML_Parser.py", line 10, in ? > for blast_record in blast_records: > File > "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- > packages/Bio/Blast/NCBIXML.py", > line 572, in parse > expat_parser.Parse(text, False) > File > "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- > packages/Bio/Blast/NCBIXML.py", > line 98, in endElement > eval("self.%s()" % method) > File "", line 0, in ? > File > "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- > packages/Bio/Blast/NCBIXML.py", > line 215, in _end_BlastOutput_version > self._header.version = self._value.split()[1] > IndexError: list index out of range > > All I did was: > > result_handle = open('NifH_Blast.xml') > from Bio.Blast import NCBIXML > blast_records = NCBIXML.parse(result_handle) > for blast_record in blast_records: > ... etc > > I put my script and xml file here: > http://www.staff.washington.edu/braz/files > > I'm using biopython 1.43, and I get the same error on both Python > 2.3.5 and Python 5. > > It seems like my commands are exactly what is in the tutorial, so I'm > confused. My best guess is that there is a difference in the XML > format, but it's NCBI XML. Thanks for any help, > > Bill Brazelton > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From sbassi at gmail.com Tue Oct 9 17:09:09 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 9 Oct 2007 18:09:09 -0300 Subject: [BioPython] Getting Qv using Python? Message-ID: Is there an automated way to get Quality Values (QV) from a ab1 file? I wrap Abiview [1] to get the sequence, but now I need the Qv. [1] http://bioweb.pasteur.fr/docs/EMBOSS/abiview.html -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From prashanth at ibioinformatics.org Wed Oct 10 08:17:26 2007 From: prashanth at ibioinformatics.org (Prashantha Hebbar Kiradi) Date: Wed, 10 Oct 2007 17:47:26 +0530 Subject: [BioPython] where is SeqIO.parse()? Message-ID: <470CC2D6.1090504@ibioinformatics.org> Hi everybody, While trying the example of 'Parsing sequence file formats' from section 2.4 of Biopython tutorial: ------------------------------------------------- from Bio import SeqIO handle = open("ls_orchid.fasta") for seq_record in SeqIO.parse(handle, "fasta") : print seq_record.id print seq_record.seq print len(seq_record.seq) handle.close() ------------------------------------------------- I get this error: ------------------------------------------------- Traceback (most recent call last): File "fastEx.py", line 5, in for seq_record in SeqIO.parse(handle, "fasta") : AttributeError: 'module' object has no attribute 'parse' ------------------------------------------------- Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm using is opening correctly. API documentation reports that the 'parse' function is there. What am I doing wrong? I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1. Thanks in advance, Prashantha Hebbar Institute of Bioinformatics ITPL, Bangalore, INDIA From fennan at gmail.com Wed Oct 10 08:20:56 2007 From: fennan at gmail.com (Fernando) Date: Wed, 10 Oct 2007 14:20:56 +0200 Subject: [BioPython] Code publications Message-ID: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> Hi everybody, This might be off-topic, or maybe not: I've been working with biopython for a while and I am curious about what the authors get from all the exceptional work they are doing... I know it won't have to do anything with money, but in terms of publication / copyrihts etc, what are the adventages of having your code in biopython? Is there a journey / conference where the author publish their works and likewise they can be referenced or something like that? Thanks, Fernando From mdehoon at c2b2.columbia.edu Wed Oct 10 08:24:33 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 10 Oct 2007 08:24:33 -0400 Subject: [BioPython] where is SeqIO.parse()? References: <470CC2D6.1090504@ibioinformatics.org> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B635@mail2.exch.c2b2.columbia.edu> > I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1. Use Biopython 1.43. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Prashantha Hebbar Kiradi Sent: Wed 10/10/2007 8:17 AM To: biopython at biopython.org Subject: [BioPython] where is SeqIO.parse()? Hi everybody, While trying the example of 'Parsing sequence file formats' from section 2.4 of Biopython tutorial: ------------------------------------------------- from Bio import SeqIO handle = open("ls_orchid.fasta") for seq_record in SeqIO.parse(handle, "fasta") : print seq_record.id print seq_record.seq print len(seq_record.seq) handle.close() ------------------------------------------------- I get this error: ------------------------------------------------- Traceback (most recent call last): File "fastEx.py", line 5, in for seq_record in SeqIO.parse(handle, "fasta") : AttributeError: 'module' object has no attribute 'parse' ------------------------------------------------- Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm using is opening correctly. API documentation reports that the 'parse' function is there. What am I doing wrong? I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1. Thanks in advance, Prashantha Hebbar Institute of Bioinformatics ITPL, Bangalore, INDIA _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From cjfields at uiuc.edu Wed Oct 10 10:14:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 Oct 2007 09:14:48 -0500 Subject: [BioPython] Code publications In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> Message-ID: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu> This is a question that could be posed for any open-source project. It differs per person in my opinion. For instance, I donate time and code to BioPerl based on several factors. Not reinventing the wheel, giving back to the community, access to the code base, and the joy of programming (believe it or not) are among them, but they aren't the only ones. Publications don't hurt but they aren't my primary motivation. It generally isn't the focus of my research, only a means to an end (to parse or generate data). I don't see anything wrong with it being someone else's primary drive to donate as long as they continue support their code post-publication, an issue that unfortunately pops up quite frequently. chris On Oct 10, 2007, at 7:20 AM, Fernando wrote: > Hi everybody, > > This might be off-topic, or maybe not: > > I've been working with biopython for a while and I am curious about > what the > authors get from all the exceptional work they are doing... I know > it won't > have to do anything with money, but in terms of publication / > copyrihts etc, > what are the adventages of having your code in biopython? Is there > a journey > / conference where the author publish their works and likewise they > can be > referenced or something like that? > > Thanks, > Fernando > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From biopython at maubp.freeserve.co.uk Wed Oct 10 08:42:01 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Oct 2007 13:42:01 +0100 Subject: [BioPython] Code publications In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> Message-ID: <470CC899.6080802@maubp.freeserve.co.uk> Fernando wrote: > Hi everybody, > > This might be off-topic, or maybe not: > > I've been working with biopython for a while and I am curious about what the > authors get from all the exceptional work they are doing... I know it won't > have to do anything with money, but in terms of publication / copyrihts etc, > what are the adventages of having your code in biopython? Is there a journey > / conference where the author publish their works and likewise they can be > referenced or something like that? Pride? Looks good on a CV? Although I must say working on BioPerl would have been a better choice from the point of view of job hunting ;) Some of the specific modules have associated publications which get cited (e.g. Bio.PDB and Bio.Cluster - although the later is also available independently of Biopython). The closest to a general Biopython paper is currently Chapman and Chang 2000. In terms of talks, most recently I gave a talk at BOSC 2007 in July, the "Biopython Project Update". Which reminds me, I have a few photos and the slides (sadly in PowerPoint - my initial attempt to convert them into PDF wasn't great, font issues leading to content getting cropped). Peter From tiagoantao at gmail.com Wed Oct 10 12:59:56 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Wed, 10 Oct 2007 17:59:56 +0100 Subject: [BioPython] Code publications In-Reply-To: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu> References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu> Message-ID: <470D050C.7060500@gmail.com> I am currently submitting my populations genetics' code into biopython and I can talk about my motivations. Most of the code that I am submitting was used in something that I have done in the past (sometimes published). I figured, that if I have the code sitting here, I could as well donate it. This has one interesting advantage for me: all the code that I know I will try to submit to biopython is designed with care, all the code that is a one off is really a big mess. For me making code public is a motivator to maintain clean code. It is also a way to get to know people that are interested in this type of problems, and I think that, as with all things in life, knowing more people is a good thing. Maybe, in 12/18 months time I might think in suggesting to other people writing an article on the popgen work in biopython. Lets face it, that is also a good motivator. But, if it is the only one, I would agree that is not good (as Chris says, maintenance after publication...) Last, but not least: ethical and moral issues. Having spent some time outside of science I do think most scientific work is done in a very closed fashion (it was a shock to me, really). From my personal point of view open science and free software are arguments to which I connect moral value. Tiago Chris Fields wrote: > This is a question that could be posed for any open-source project. > > It differs per person in my opinion. For instance, I donate time and > code to BioPerl based on several factors. Not reinventing the wheel, > giving back to the community, access to the code base, and the joy of > programming (believe it or not) are among them, but they aren't the > only ones. > > Publications don't hurt but they aren't my primary motivation. It > generally isn't the focus of my research, only a means to an end (to > parse or generate data). I don't see anything wrong with it being > someone else's primary drive to donate as long as they continue > support their code post-publication, an issue that unfortunately pops > up quite frequently. > > chris > > On Oct 10, 2007, at 7:20 AM, Fernando wrote: > > >> Hi everybody, >> >> This might be off-topic, or maybe not: >> >> I've been working with biopython for a while and I am curious about >> what the >> authors get from all the exceptional work they are doing... I know >> it won't >> have to do anything with money, but in terms of publication / >> copyrihts etc, >> what are the adventages of having your code in biopython? Is there >> a journey >> / conference where the author publish their works and likewise they >> can be >> referenced or something like that? >> >> Thanks, >> Fernando >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From rebekah.rogers at gmail.com Thu Oct 11 14:57:21 2007 From: rebekah.rogers at gmail.com (Rebekah Rogers) Date: Thu, 11 Oct 2007 14:57:21 -0400 Subject: [BioPython] running PAML in python Message-ID: <79def59f0710111157h7483d5b5m6e6cdb3b86266750@mail.gmail.com> Hello: Does anyone know of an existing library that can run aligned sequences in PAML and then pull out the dN/dS values? Thanks! -Rebekah From The_Polymorph at rocketmail.com Sun Oct 14 13:04:48 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 14 Oct 2007 10:04:48 -0700 (PDT) Subject: [BioPython] Performing sequence alignments, etc. Message-ID: <311410.84366.qm@web50801.mail.re2.yahoo.com> Hi all. Hi all. I'm relatively new to the field of bioinformatics and I'm trying to perform a multiple sequence alignment on 5-6 sequences (fasta format - dna sequences). I'd like the output to be formatted in the following manner (clustalw standalone output): accession_number1: atctcgatatcgggcgctcta... accession_number2: atctctattctctggatctct... ... When one more more nucleotides columns are identical, clustalw displays an asterisk. If not, a blank space is displayed. Is this a standard feature of BioPython? Also, I'm evaluating several sequences but I'd like to obtain the most recent complete genomes possible from various countries. Is there a convenient source to use (GenBank?) if I don't know the accession numbers? Thanks, ~Caitlin Thanks, ~Caitlin ____________________________________________________________________________________ Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/ From biopython at maubp.freeserve.co.uk Sun Oct 14 13:38:32 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 14 Oct 2007 18:38:32 +0100 Subject: [BioPython] Performing sequence alignments, etc. In-Reply-To: <311410.84366.qm@web50801.mail.re2.yahoo.com> References: <311410.84366.qm@web50801.mail.re2.yahoo.com> Message-ID: <47125418.5020009@maubp.freeserve.co.uk> Caitlin wrote: > Hi all. > > I'm relatively new to the field of bioinformatics and I'm trying to > perform a multiple sequence alignment on 5-6 sequences (fasta format - > dna sequences). I'd like the output to be formatted in the following > manner (clustalw standalone output): For reading and writing Clustalw alignment files, you could either use Bio.SeqIO (format name "clustal") or the Bio.Clustalw module. http://biopython.org/wiki/SeqIO > When one more more nucleotides columns are identical, clustalw displays > an asterisk. If not, a blank space is displayed. Is this a standard > feature of BioPython? There is an example of Clustalw output online here - note there can also be a column of numbers on the right hand side (not shown here): http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format It sounds like you are describing the simple consensus string which clustalw outputs under the alignment (using *:. and space). Biopython has a SummaryInfo object which can calculate simple consensus sequences (see the tutorial). Perhaps this would be close to what you want to do. > Also, I'm evaluating several sequences but I'd like to obtain the most > recent complete genomes possible from various countries. Is there a > convenient source to use (GenBank?) if I don't know the accession > numbers? What sort of Genomes? Bacteria? Vertebrates? You could start by having a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these three are kept in sync with each other). Biopython has quite a nice interface for searching and downloading sequences from GenBank (again, see the tutorial) so that would be my first suggestion. Peter From The_Polymorph at rocketmail.com Sun Oct 14 22:13:24 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 14 Oct 2007 19:13:24 -0700 (PDT) Subject: [BioPython] Performing sequence alignments, etc. In-Reply-To: <47125418.5020009@maubp.freeserve.co.uk> Message-ID: <129586.66498.qm@web50807.mail.re2.yahoo.com> Thanks Peter. The genomes are viral. I'll definitely read that tutorial. Your help is very appreciated. ~Caitlin --- Peter wrote: > Caitlin wrote: > > Hi all. > > > > I'm relatively new to the field of bioinformatics and I'm trying to > > perform a multiple sequence alignment on 5-6 sequences (fasta > format - > > dna sequences). I'd like the output to be formatted in the > following > > manner (clustalw standalone output): > > For reading and writing Clustalw alignment files, you could either > use > Bio.SeqIO (format name "clustal") or the Bio.Clustalw module. > http://biopython.org/wiki/SeqIO > > > When one more more nucleotides columns are identical, clustalw > displays > > an asterisk. If not, a blank space is displayed. Is this a standard > > feature of BioPython? > > There is an example of Clustalw output online here - note there can > also > be a column of numbers on the right hand side (not shown here): > http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format > > It sounds like you are describing the simple consensus string which > clustalw outputs under the alignment (using *:. and space). > > Biopython has a SummaryInfo object which can calculate simple > consensus > sequences (see the tutorial). Perhaps this would be close to what you > > want to do. > > > Also, I'm evaluating several sequences but I'd like to obtain the > most > > recent complete genomes possible from various countries. Is there a > > convenient source to use (GenBank?) if I don't know the accession > > numbers? > > What sort of Genomes? Bacteria? Vertebrates? You could start by > having > a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these > three are kept in sync with each other). > > Biopython has quite a nice interface for searching and downloading > sequences from GenBank (again, see the tutorial) so that would be my > first suggestion. > > Peter > > > > "Be who you are and say what you feel because those who mind don't matter and those who matter don't mind." - Dr. Seuss, "Oh the Places You'll Go" ____________________________________________________________________________________ Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos. http://autos.yahoo.com/index.html From fredgca at hotmail.com Mon Oct 15 09:02:27 2007 From: fredgca at hotmail.com (Frederico Arnoldi) Date: Mon, 15 Oct 2007 13:02:27 +0000 Subject: [BioPython] where is SeqIO.parse()? In-Reply-To: References: Message-ID: Dear Kiradi, Concerning your subject question: where is SeqIO.parse()? >>> from Bio import SeqIO >>> SeqIO So, in my system, it is at /usr/lib/python2.4/site-packages/Bio/SeqIO/__init__.py. Try the same command in your python console and see where it is in yours. Concerning your problem: Try >>> from Bio import SeqIO >>> dir() ['SeqIO', '__builtins__', '__doc__', '__name__'] >>> dir(SeqIO) ['Alignment', 'ClustalIO', 'FastaIO', 'InsdcIO', 'Interfaces', 'NexusIO', 'PhylipIO', 'Seq', 'SeqRecord', 'StockholmIO', 'StringIO', 'SwissIO', '_FormatToIterator', '_FormatToWriter', '__builtins__', '__doc__', '__file__', '__name__', '__path__', 'generic_alphabet', 'generic_protein', 'os', 'parse', 'to_alignment', 'to_dict', 'write'] Do you get the same result? See that "parse" is in my SeqIO. Is it in yours? I noted that installing biopython via apt in Ubunutu, the __init__.py in Bio/SeqIO was empty. Maybe it is the source of your problem. But if I am right, when you type, in your system, dir(SeqIO), you get ['__builtins__', '__doc__', '__file__', '__name__', '__path__'], confirming your __init__.py is empty. Check it. If this is your problem, try installing biopyton by the tar.gz file available in Biopython home page. Good luck, Fred ---------------------------------------------------------------------->> Message: 1> Date: Wed, 10 Oct 2007 17:47:26 +0530> From: Prashantha Hebbar Kiradi > Subject: [BioPython] where is SeqIO.parse()?> To: biopython at biopython.org> Message-ID: > Content-Type: text/plain; charset=ISO-8859-1; format=flowed>> Hi everybody,>> While trying the example of 'Parsing sequence file formats' from section> 2.4 of Biopython tutorial:> -------------------------------------------------> from Bio import SeqIO> handle = open("ls_orchid.fasta")> for seq_record in SeqIO.parse(handle, "fasta") :> print seq_record.id> print seq_record.seq> print len(seq_record.seq)> handle.close()> ------------------------------------------------->>> I get this error:> -------------------------------------------------> Traceback (most recent call last):> File "fastEx.py", line 5, in > for seq_record in SeqIO.parse(handle, "fasta") :> AttributeError: 'module' object has no attribute 'parse'> ------------------------------------------------->> Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm> using is opening correctly.>> API documentation reports that the 'parse' function is there. What am I> doing wrong?>> I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.>> Thanks in advance,>> Prashantha Hebbar> Institute of Bioinformatics> ITPL, _________________________________________________________________ Receba as ?ltimas not?cias do Brasil e do mundo direto no seu Messenger com Alertas MSN! ? GR?TIS! http://alertas.br.msn.com/ From ytu888 at hotmail.com Mon Oct 15 12:19:47 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 15 Oct 2007 11:19:47 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> Message-ID: Hi Steve, Thank you for your email. I was away for a week. What do you mean "fresh" python prompt? I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded online. I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, am I right? Once again, thank you very much for your help.. > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Wed, 3 Oct 2007 10:47:41 -0400 > To: ytu888 at hotmail.com > > > Steve, thank you very much. It fixed the problem and I got through > > the build and install step. But when I tested inside the python for > > the installation I got following error. Please help me about it. > > Thanks. > > > > >>> import MySQLdb > > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ > > site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > > _mysql.py:3: UserWarning: Module _mysql was already imported from / > > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, > > but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to > > sys.path > > import sys, pkg_resources, imp > > Traceback (most recent call last): > > File "", line 1, in > > File "MySQLdb/__init__.py", line 19, in > > import _mysql > > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in > > > > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in > > __bootstrap__ > > ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / > > usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib > > Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so > > Reason: image not found > > > Sorry, don't know exactly what's happening here. Is this from a > "fresh" python prompt? > > How did you install MySQLdb, did you use easy_install? If so, try to > install from the sourceforge download. > > Try to remove it, remove the "build" directory from your mysqldb > download and redo the whole > python setup.py build / python setup.py install process > > To remove it, nuke this: > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg > > And try to reinstall? > > Perhaps someone who knows what the problem is here can give you a > better idea on what to do. > > -steve _________________________________________________________________ Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033 From lists.steve at arachnedesign.net Mon Oct 15 12:30:21 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Mon, 15 Oct 2007 12:30:21 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> Message-ID: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Hi, > Thank you for your email. I was away for a week. > What do you mean "fresh" python prompt? > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded > online. > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, > am I right? I'm not sure, exactly. Last time I checked, the only thing you needed to use mysql from python was: (a) A working mysql install (the client/server) (b) The mysqldb package from: http://sourceforge.net/projects/mysql- python I'm assuming (a) is installed correctly since you are using the .mpkg from mysql.org, so I'd just try to fix (b). You try do so by doing the following: (1) Remove your original attempt at installing the python mysqldb library. From the looks of your error messages, it seems to be installed here: Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ (2) remove the build directory in your mysqldb directory (the one you are installing from) by cd-ing into your mysqldb download, and removing the build directory you find there. (3) reinstall mysqldb by doing the usual `pythong setup.py build` and `sudo python setup.py install` dance For the record, I'm not sure what you are talking about when you are distinguishing between "MySQL_python_1.2.2, not MySQLdb" are you trying to install two python libraries to access mysql? -steve From ytu888 at hotmail.com Mon Oct 15 13:18:42 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 15 Oct 2007 12:18:42 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Message-ID: What I said: "MySQL_python_1.2.2, not MySQLdb" means to uninstall MySQL_python not the mysql client/server installed with the mpkg. I just deleted the MYSQL....fat.egg file and downloaded the MySAL-python-1.2.2.tar. I repeated the installation process. However, when I run import MySQLdb, I got the same error message. Is there any other things I should take a look? Thank you very much. CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 15 Oct 2007 12:30:21 -0400 > To: ytu888 at hotmail.com > > Hi, > > > Thank you for your email. I was away for a week. > > What do you mean "fresh" python prompt? > > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded > > online. > > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, > > am I right? > > I'm not sure, exactly. > > Last time I checked, the only thing you needed to use mysql from > python was: > > (a) A working mysql install (the client/server) > (b) The mysqldb package from: http://sourceforge.net/projects/mysql- > python > > I'm assuming (a) is installed correctly since you are using the .mpkg > from mysql.org, so I'd just try to fix (b). > > You try do so by doing the following: > > (1) Remove your original attempt at installing the python mysqldb > library. From the looks of your error messages, it seems to be > installed here: > > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > > (2) remove the build directory in your mysqldb directory (the one you > are installing from) by cd-ing into your mysqldb download, and > removing the build directory you find there. > > (3) reinstall mysqldb by doing the usual `pythong setup.py build` and > `sudo python setup.py install` dance > > For the record, I'm not sure what you are talking about when you are > distinguishing between "MySQL_python_1.2.2, not MySQLdb" > > are you trying to install two python libraries to access mysql? > > -steve > _________________________________________________________________ Boo!?Scare away worms, viruses and so much more! Try Windows Live OneCare! http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews From ytu888 at hotmail.com Tue Oct 16 13:06:36 2007 From: ytu888 at hotmail.com (Y Tu) Date: Tue, 16 Oct 2007 12:06:36 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Message-ID: Hi, I reinstalled everything and checked every step. I found that there are had some warnings in 'build" step (underlined) . I wonder if they are the reason why I got the error messages when running "import MySQLdb" under the python prompt and how to fix the problem. Thank you very much. LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python setup.py build running build running build_py ... ... /usr/bin/ld: for architecture ppc /usr/bin/ld: warning build/temp.macosx-10.3-fat-2.5/_mysql.o cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) /usr/bin/ld: warning /usr/local/mysql/lib/libmysqlclient_r.dylib cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ sudo python setup.py install Password: running install ... ... Adding MySQL-python 1.2.2 to easy-install.pth file Installed /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg Processing dependencies for MySQL-python==1.2.2 LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import MySQLdb /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to sys.path import sys, pkg_resources, imp Traceback (most recent call last): File "", line 1, in File "MySQLdb/__init__.py", line 19, in import _mysql File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in __bootstrap__ ImportError: dlopen(/Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib Referenced from: /Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so Reason: image not found > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 15 Oct 2007 12:30:21 -0400 > To: ytu888 at hotmail.com > > Hi, > > > Thank you for your email. I was away for a week. > > What do you mean "fresh" python prompt? > > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded > > online. > > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, > > am I right? > > I'm not sure, exactly. > > Last time I checked, the only thing you needed to use mysql from > python was: > > (a) A working mysql install (the client/server) > (b) The mysqldb package from: http://sourceforge.net/projects/mysql- > python > > I'm assuming (a) is installed correctly since you are using the .mpkg > from mysql.org, so I'd just try to fix (b). > > You try do so by doing the following: > > (1) Remove your original attempt at installing the python mysqldb > library. From the looks of your error messages, it seems to be > installed here: > > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > > (2) remove the build directory in your mysqldb directory (the one you > are installing from) by cd-ing into your mysqldb download, and > removing the build directory you find there. > > (3) reinstall mysqldb by doing the usual `pythong setup.py build` and > `sudo python setup.py install` dance > > For the record, I'm not sure what you are talking about when you are > distinguishing between "MySQL_python_1.2.2, not MySQLdb" > > are you trying to install two python libraries to access mysql? > > -steve > _________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us From fennan at gmail.com Tue Oct 16 13:51:30 2007 From: fennan at gmail.com (Fernando) Date: Tue, 16 Oct 2007 19:51:30 +0200 Subject: [BioPython] Precompute database information Message-ID: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> Hi everybody, I am thinking in including some algorithms that I work with into biopython. My first concern is that I'm using a local image of the Gene Ontology database to perform several operations. In order to avoid such database accesses I could precompute the information I need and load it once the module is called. How should I do it? Is there a guideline style to load external variables or something like that? Any other ideas/suggestions? Thanks From fennan at gmail.com Tue Oct 16 14:55:54 2007 From: fennan at gmail.com (Fernando) Date: Tue, 16 Oct 2007 20:55:54 +0200 Subject: [BioPython] Precompute database information In-Reply-To: <4714FD13.2020708@maubp.freeserve.co.uk> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> Message-ID: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> Hi Peter, >How big would your pre-computed data be? If its some sort of table or >other simple data you could perhaps use a simple text file; Another idea > for complicated objects is to use python's pickle module. It would be big... I an dealing with pairwise terms comparisons and I want to consider different species as well. >How often would the pre-computed data need to be updated? Every time >there is a new Gene Ontology release? It might be better have the >module download and cache the latest version on request (rather than >shipping an out of date dataset with Biopython). Yes, I could do that... It would be OK in Biopython to use mysql? If so the module could download the last GO version on request, install it and work with that version until the users decides to update it. On 10/16/07, Peter wrote: > > Fernando wrote: > > Hi everybody, > > > > I am thinking in including some algorithms that I work with into > biopython. > > My first concern is that I'm using a local image of the Gene Ontology > > database to perform several operations. In order to avoid such database > > accesses I could precompute the information I need and load it once the > > module is called. How should I do it? Is there a guideline style to load > > external variables or something like that? Any other ideas/suggestions? > > I think you need to go into more detail. > > How big would your pre-computed data be? If its some sort of table or > other simple data you could perhaps use a simple text file; Another idea > for complicated objects is to use python's pickle module. > > How often would the pre-computed data need to be updated? Every time > there is a new Gene Ontology release? It might be better have the > module download and cache the latest version on request (rather than > shipping an out of date dataset with Biopython). > > I don't think we have anything in Biopython that requires regular > updates. Things like genomes and sequence databases are left up to the > user. > > Peter > > From sdavis2 at mail.nih.gov Tue Oct 16 15:26:18 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 16 Oct 2007 15:26:18 -0400 Subject: [BioPython] Precompute database information In-Reply-To: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> Message-ID: <4715105A.30705@mail.nih.gov> Fernando wrote: > Hi Peter, > >> How big would your pre-computed data be? If its some sort of table or >> other simple data you could perhaps use a simple text file; Another idea >> for complicated objects is to use python's pickle module. > > It would be big... I an dealing with pairwise terms comparisons and I want > to consider different species as well. > >> How often would the pre-computed data need to be updated? Every time >> there is a new Gene Ontology release? It might be better have the >> module download and cache the latest version on request (rather than >> shipping an out of date dataset with Biopython). > > Yes, I could do that... It would be OK in Biopython to use mysql? If so the > module could download the last GO version on request, install it and work > with that version until the users decides to update it. Asking users to use MySQL to do updates might be a bit much. Could this be done from the .obo files? Sean From biopython at maubp.freeserve.co.uk Tue Oct 16 14:04:03 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 19:04:03 +0100 Subject: [BioPython] Precompute database information In-Reply-To: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> Message-ID: <4714FD13.2020708@maubp.freeserve.co.uk> Fernando wrote: > Hi everybody, > > I am thinking in including some algorithms that I work with into biopython. > My first concern is that I'm using a local image of the Gene Ontology > database to perform several operations. In order to avoid such database > accesses I could precompute the information I need and load it once the > module is called. How should I do it? Is there a guideline style to load > external variables or something like that? Any other ideas/suggestions? I think you need to go into more detail. How big would your pre-computed data be? If its some sort of table or other simple data you could perhaps use a simple text file; Another idea for complicated objects is to use python's pickle module. How often would the pre-computed data need to be updated? Every time there is a new Gene Ontology release? It might be better have the module download and cache the latest version on request (rather than shipping an out of date dataset with Biopython). I don't think we have anything in Biopython that requires regular updates. Things like genomes and sequence databases are left up to the user. Peter From fennan at gmail.com Wed Oct 17 07:12:36 2007 From: fennan at gmail.com (Fernando) Date: Wed, 17 Oct 2007 07:12:36 -0400 Subject: [BioPython] Precompute database information In-Reply-To: <4715105A.30705@mail.nih.gov> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> <4715105A.30705@mail.nih.gov> Message-ID: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> >Asking users to use MySQL to do updates might be a bit much. Could this >be done from the .obo files? I think that's probably the best solution... Is there any python module for working with OBO / OWL formats? I've been searching but people seem to use BioPerl for this matter On 10/16/07, Sean Davis wrote: > > Fernando wrote: > > Hi Peter, > > > >> How big would your pre-computed data be? If its some sort of table or > >> other simple data you could perhaps use a simple text file; Another > idea > >> for complicated objects is to use python's pickle module. > > > > It would be big... I an dealing with pairwise terms comparisons and I > want > > to consider different species as well. > > > >> How often would the pre-computed data need to be updated? Every time > >> there is a new Gene Ontology release? It might be better have the > >> module download and cache the latest version on request (rather than > >> shipping an out of date dataset with Biopython). > > > > Yes, I could do that... It would be OK in Biopython to use mysql? If so > the > > module could download the last GO version on request, install it and > work > > with that version until the users decides to update it. > > Asking users to use MySQL to do updates might be a bit much. Could this > be done from the .obo files? > > Sean > From sdavis2 at mail.nih.gov Wed Oct 17 11:34:17 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 17 Oct 2007 11:34:17 -0400 Subject: [BioPython] Precompute database information In-Reply-To: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> <4715105A.30705@mail.nih.gov> <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> Message-ID: <47162B79.8080204@mail.nih.gov> Fernando wrote: >>Asking users to use MySQL to do updates might be a bit much. Could this >>be done from the .obo files? > > I think that's probably the best solution... Is there any python module > for working with OBO / OWL formats? I've been searching but people seem > to use BioPerl for this matter In a way, it seems silly to reimplement the Bio::OntologyIO stuff in python, but I (and others, after a quick google search) would probably benefit from such a thing. I'm not able to devote much time right this minute to the project, but I think that, given the huge number of particularly obo format files available, there would be use for such parsers and tools in biopython. How much interest/need is there for a Bio.OntologyIO like thing? Has anyone made any attempts at creating one? For a list of available biologic ontologies (to see what we are missing), see here: http://obofoundry.org/ Sean From luca.beltrame at unimi.it Wed Oct 17 11:59:47 2007 From: luca.beltrame at unimi.it (Luca Beltrame) Date: Wed, 17 Oct 2007 17:59:47 +0200 Subject: [BioPython] Precompute database information In-Reply-To: <47162B79.8080204@mail.nih.gov> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> <47162B79.8080204@mail.nih.gov> Message-ID: <200710171759.48595.luca.beltrame@unimi.it> Il Wednesday 17 October 2007 17:34:17 Sean Davis ha scritto: > In a way, it seems silly to reimplement the Bio::OntologyIO stuff in It depends on the perspective, as for some learning yet another programming language would be a drawback. > parsers and tools in biopython. How much interest/need is there for a > Bio.OntologyIO like thing? Has anyone made any attempts at creating one? Personally speaking, I would love it. No time (and skill) to even think about doing something like that, though. -- Luca Beltrame, MSc. - Molecular Medicine PhD Student Dipartimento di Scienze e Tecnologie Biomediche - UniMI CNR - Institute of Biomedical Technologies Research Fellow E-mail: luca dot beltrame [at] unimi dot it - Phone: +39-02-50320924 From jimmy.musselwhite at gmail.com Wed Oct 17 17:20:41 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 17:20:41 -0400 Subject: [BioPython] Question about Seq.count() Message-ID: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> Hello all I have a script that is running through a list of about 250,000 sequence records and counting the number of times it counts substrings of 3-5 nucleotides in length Here is some example code search = 'ATTCG' #use SeqIO to get a big list of records sequences = list(SeqIO.parse(file, "fasta") for record in sequences : Now the code I want to do is record.seq.count(search) but what I am forced to do is record.seq.tostring().count(search) The problem here is that when I am forced to use .tostring() on every single seq object it devastates my memory usage in a BIG way. It eats up about 1.2gigs and then crashes. If I remove the .tostring() and just tell if to search for 'A', it will run fine and use memory at about 1/100th the rate So my question sums down to, is there any way to make .count() be able to search for strings and not just characters? Otherwise my work is going to grind to a halt here. Thanks! From biopython at maubp.freeserve.co.uk Wed Oct 17 18:03:51 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Oct 2007 23:03:51 +0100 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> Message-ID: <471686C7.6050305@maubp.freeserve.co.uk> Jimmy Musselwhite wrote: > Now the code I want to do is > record.seq.count(search) > > but what I am forced to do is > record.seq.tostring().count(search) > > The problem here is that when I am forced to use .tostring() on every single > seq object it devastates my memory usage in a BIG way. It eats up about > 1.2gigs and then crashes. If I remove the .tostring() and just tell if to > search for 'A', it will run fine and use memory at about 1/100th the rate In the short term, try record.seq.data.count(search) which is what the tostring() method is doing anyway (the Seq object stores the sequence internally as a string). Does that help? We might be tweaking the Seq object after the next release to act a bit more like a string - at which point the .data property might go away. > So my question sums down to, is there any way to make .count() be able to > search for strings and not just characters? You I'd never noticed that - I would call it a bug... >>> from Bio.Seq import Seq >>> my_seq = Seq("AAACACACGGTTTT") >>> my_seq.data.count("GG") 1 >>> my_seq.data.count("G") 2 >>> my_seq.tostring().count("G") 2 >>> my_seq.tostring().count("GG") 1 >>> my_seq.count("G") 2 >>> my_seq.count("GG") 0 Peter From jimmy.musselwhite at gmail.com Wed Oct 17 18:48:09 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 18:48:09 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <471686C7.6050305@maubp.freeserve.co.uk> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> Message-ID: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> Thanks guys! That worked great. On 10/17/07, Peter wrote: > > Jimmy Musselwhite wrote: > > Now the code I want to do is > > record.seq.count(search) > > > > but what I am forced to do is > > record.seq.tostring().count(search) > > > > The problem here is that when I am forced to use .tostring() on every > single > > seq object it devastates my memory usage in a BIG way. It eats up about > > 1.2gigs and then crashes. If I remove the .tostring() and just tell if > to > > search for 'A', it will run fine and use memory at about 1/100th the > rate > > In the short term, try record.seq.data.count(search) which is what the > tostring() method is doing anyway (the Seq object stores the sequence > internally as a string). Does that help? > > We might be tweaking the Seq object after the next release to act a bit > more like a string - at which point the .data property might go away. > > > So my question sums down to, is there any way to make .count() be able > to > > search for strings and not just characters? > > You I'd never noticed that - I would call it a bug... > > >>> from Bio.Seq import Seq > >>> my_seq = Seq("AAACACACGGTTTT") > >>> my_seq.data.count("GG") > 1 > >>> my_seq.data.count("G") > 2 > >>> my_seq.tostring().count("G") > 2 > >>> my_seq.tostring().count("GG") > 1 > >>> my_seq.count("G") > 2 > >>> my_seq.count("GG") > 0 > > Peter > > From jimmy.musselwhite at gmail.com Wed Oct 17 18:52:07 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 18:52:07 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> Message-ID: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> Just kidding, it didn't work great. It only "fixed" it because I was printing out the output of count() and so it was just executing 100 times slower and thus eating RAM 100 times slower :( It doesn't seem like there is a good way for me to fix this. On 10/17/07, Jimmy Musselwhite wrote: > > Thanks guys! That worked great. > > On 10/17/07, Peter wrote: > > > > Jimmy Musselwhite wrote: > > > Now the code I want to do is > > > record.seq.count(search) > > > > > > but what I am forced to do is > > > record.seq.tostring().count(search) > > > > > > The problem here is that when I am forced to use .tostring() on every > > single > > > seq object it devastates my memory usage in a BIG way. It eats up > > about > > > 1.2gigs and then crashes. If I remove the .tostring() and just tell if > > to > > > search for 'A', it will run fine and use memory at about 1/100th the > > rate > > > > In the short term, try record.seq.data.count (search) which is what the > > tostring() method is doing anyway (the Seq object stores the sequence > > internally as a string). Does that help? > > > > We might be tweaking the Seq object after the next release to act a bit > > more like a string - at which point the .data property might go away. > > > > > So my question sums down to, is there any way to make .count() be able > > to > > > search for strings and not just characters? > > > > You I'd never noticed that - I would call it a bug... > > > > >>> from Bio.Seq import Seq > > >>> my_seq = Seq("AAACACACGGTTTT") > > >>> my_seq.data.count("GG") > > 1 > > >>> my_seq.data.count("G") > > 2 > > >>> my_seq.tostring().count("G") > > 2 > > >>> my_seq.tostring().count("GG") > > 1 > > >>> my_seq.count("G") > > 2 > > >>> my_seq.count("GG") > > 0 > > > > Peter > > > > > From jimmy.musselwhite at gmail.com Wed Oct 17 19:04:26 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 19:04:26 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> Message-ID: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com> In response to the first reply you gave me, where you said this You I'd never noticed that - I would call it a bug... >>> from Bio.Seq import Seq >>> my_seq = Seq("AAACACACGGTTTT") >>> my_seq.data.count("GG") 1 >>> my_seq.data.count("G") 2 >>> my_seq.tostring().count("G") 2 >>> my_seq.tostring().count("GG") 1 >>> my_seq.count("G") 2 >>> my_seq.count("GG") 0 I've tried that many many times and I always get 0 when I do my_seq.count("GG") I just rebuilt biopython from the latest CVS tarball and it still does not work. I have no idea why yours works and mine doesn't. On 10/17/07, Jimmy Musselwhite wrote: > > Just kidding, it didn't work great. It only "fixed" it because I was > printing out the output of count() and so it was just executing 100 times > slower and thus eating RAM 100 times slower :( > > It doesn't seem like there is a good way for me to fix this. > > On 10/17/07, Jimmy Musselwhite wrote: > > > > Thanks guys! That worked great. > > > > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote: > > > > > > Jimmy Musselwhite wrote: > > > > Now the code I want to do is > > > > record.seq.count(search) > > > > > > > > but what I am forced to do is > > > > record.seq.tostring().count(search) > > > > > > > > The problem here is that when I am forced to use .tostring() on > > > every single > > > > seq object it devastates my memory usage in a BIG way. It eats up > > > about > > > > 1.2gigs and then crashes. If I remove the .tostring() and just tell > > > if to > > > > search for 'A', it will run fine and use memory at about 1/100th the > > > rate > > > > > > In the short term, try record.seq.data.count (search) which is what > > > the > > > tostring() method is doing anyway (the Seq object stores the sequence > > > internally as a string). Does that help? > > > > > > We might be tweaking the Seq object after the next release to act a > > > bit > > > more like a string - at which point the .data property might go away. > > > > > > > So my question sums down to, is there any way to make .count() be > > > able to > > > > search for strings and not just characters? > > > > > > You I'd never noticed that - I would call it a bug... > > > > > > >>> from Bio.Seq import Seq > > > >>> my_seq = Seq("AAACACACGGTTTT") > > > >>> my_seq.data.count("GG") > > > 1 > > > >>> my_seq.data.count("G") > > > 2 > > > >>> my_seq.tostring().count("G") > > > 2 > > > >>> my_seq.tostring().count("GG") > > > 1 > > > >>> my_seq.count("G") > > > 2 > > > >>> my_seq.count("GG") > > > 0 > > > > > > Peter > > > > > > > > > From jimmy.musselwhite at gmail.com Wed Oct 17 19:06:03 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 19:06:03 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com> Message-ID: <86e5e8970710171606x4ac9b3feg23f2409a4385d237@mail.gmail.com> Man I"m sorry, I didn't read that well enough. It doesn't work for you either. I'm gonna stop responding to this e-mail now :) I'm clearly tired or something. On 10/17/07, Jimmy Musselwhite wrote: > > In response to the first reply you gave me, where you said this > > You I'd never noticed that - I would call it a bug... > > >>> from Bio.Seq import Seq > >>> my_seq = Seq("AAACACACGGTTTT") > >>> my_seq.data.count("GG") > 1 > >>> my_seq.data.count("G") > 2 > >>> my_seq.tostring().count("G") > 2 > >>> my_seq.tostring().count("GG") > 1 > >>> my_seq.count("G") > 2 > >>> my_seq.count("GG") > 0 > > > I've tried that many many times and I always get 0 when I do > my_seq.count("GG") > I just rebuilt biopython from the latest CVS tarball and it still does not > work. I have no idea why yours works and mine doesn't. > > On 10/17/07, Jimmy Musselwhite wrote: > > > > Just kidding, it didn't work great. It only "fixed" it because I was > > printing out the output of count() and so it was just executing 100 times > > slower and thus eating RAM 100 times slower :( > > > > It doesn't seem like there is a good way for me to fix this. > > > > On 10/17/07, Jimmy Musselwhite < jimmy.musselwhite at gmail.com> wrote: > > > > > > Thanks guys! That worked great. > > > > > > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote: > > > > > > > > Jimmy Musselwhite wrote: > > > > > Now the code I want to do is > > > > > record.seq.count(search) > > > > > > > > > > but what I am forced to do is > > > > > record.seq.tostring().count(search) > > > > > > > > > > The problem here is that when I am forced to use .tostring() on > > > > every single > > > > > seq object it devastates my memory usage in a BIG way. It eats up > > > > about > > > > > 1.2gigs and then crashes. If I remove the .tostring() and just > > > > tell if to > > > > > search for 'A', it will run fine and use memory at about 1/100th > > > > the rate > > > > > > > > In the short term, try record.seq.data.count (search) which is what > > > > the > > > > tostring() method is doing anyway (the Seq object stores the > > > > sequence > > > > internally as a string). Does that help? > > > > > > > > We might be tweaking the Seq object after the next release to act a > > > > bit > > > > more like a string - at which point the .data property might go > > > > away. > > > > > > > > > So my question sums down to, is there any way to make .count() be > > > > able to > > > > > search for strings and not just characters? > > > > > > > > You I'd never noticed that - I would call it a bug... > > > > > > > > >>> from Bio.Seq import Seq > > > > >>> my_seq = Seq("AAACACACGGTTTT") > > > > >>> my_seq.data.count("GG") > > > > 1 > > > > >>> my_seq.data.count("G") > > > > 2 > > > > >>> my_seq.tostring().count("G") > > > > 2 > > > > >>> my_seq.tostring().count("GG") > > > > 1 > > > > >>> my_seq.count("G") > > > > 2 > > > > >>> my_seq.count("GG") > > > > 0 > > > > > > > > Peter > > > > > > > > > > > > > > From jimmy.musselwhite at gmail.com Thu Oct 18 08:48:41 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Thu, 18 Oct 2007 08:48:41 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <471733DE.6050803@maubp.freeserve.co.uk> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> <471733DE.6050803@maubp.freeserve.co.uk> Message-ID: <86e5e8970710180548u48e5780crc8d5178401d116d5@mail.gmail.com> Peter Well after a day of not thinking very hard I found my problem and it didn't have anything to do with strings at all. That was just my best guess at the time of writing this e-mail. Sorry about that =( On 10/18/07, Peter wrote: > > Jimmy Musselwhite wrote: > > Just kidding, it didn't work great. It only "fixed" it because I was > > printing out the output of count() and so it was just executing 100 > times > > slower and thus eating RAM 100 times slower :( > > > > It doesn't seem like there is a good way for me to fix this. > > Both of these are using the python string method to count "GG", the only > difference is the tostring() method has the additional small overhead of > an extra function call: > > my_seq.data.count("GG") > my_seq.tostring().count("GG") > > However, comparing these: > > my_seq.data.count("G") # using python's string count method > my_seq.tostring().count("G") # using python's string count method > my_seq.count("G") # using an iterator internally > > It could be that the Seq record's current single letter search is simply > very memory efficient compared than the python string's more flexible > multi-letter search. > > How are you measuring the RAM? If like to see memory usage figures for > the five simple examples above on a large sequence - plus doing this > directly on the equivalent string. > > Are you using Linux or Windows or Mac OS, and what version of python? I > know there have been some string optimisations in Python 2.5 (although I > don't know if any are relevant to the count method). > > Peter > > From ytu888 at hotmail.com Thu Oct 18 13:35:15 2007 From: ytu888 at hotmail.com (Y Tu) Date: Thu, 18 Oct 2007 12:35:15 -0500 Subject: [BioPython] Error for running the test code in BioSQL with Biopython manual In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Message-ID: I am still waiting for help to fix the problem on Mac (attached at the bottom). However, to make the project going I found a old PC and installed Python, MySQL, BioSql and Bio-python on it. However, when I tested the codes coming with Basic BioSQL with Biopython, I got the following error: =======================================my PC problem=============================== >>> from BioSQL import BioSeqDatabase >>> server=BioSeqDatabase.open_database(driver="MySQLdb", user="root", ... passwd="MySQLdb", host="localhost", db="bioseqdb") >>> db=server.new_database("Viral") >>> from Bio import GenBank >>> parser=GenBank.FeatureParser() >>> iterator = GenBank.Iterator(open("gbvrl.gb"), parser) >>> db.load(iterator) Traceback (most recent call last): File "", line 1, in File "C:\Python25\lib\site-packages\BioSQL\BioSeqDatabase.py", line 414, in lo ad db_loader.load_seqrecord(cur_record) File "C:\Python25\lib\site-packages\BioSQL\Loader.py", line 37, in load_seqrec ord bioentry_id = self._load_bioentry_table(record) File "C:\Python25\lib\site-packages\BioSQL\Loader.py", line 260, in _load_bioe ntry_table bioentry_id = self.adaptor.last_id('bioentry') File "C:\Python25\lib\site-packages\BioSQL\BioSeqDatabase.py", line 148, in la st_id return self.dbutils.last_id(self.cursor, table) File "C:\Python25\Lib\site-packages\BioSQL\DBUtils.py", line 34, in last_id return cursor.insert_id() AttributeError: 'Cursor' object has no attribute 'insert_id' +++++++++++++++++++++++++++++++++++++++++++++++++ Please help me to fix the problem, thanks. ========================================my old Mac problem======================== Date: Tue, 16 Oct 2007 12:06:36 -0500 From: Y Tu Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X To: Steve Lianoglou Cc: biopython at lists.open-bio.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi, I reinstalled everything and checked every step. I found that there are had some warnings in 'build" step (underlined) . I wonder if they are the reason why I got the error messages when running "import MySQLdb" under the python prompt and how to fix the problem. Thank you very much. LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python setup.py build running build running build_py ... ... /usr/bin/ld: for architecture ppc /usr/bin/ld: warning build/temp.macosx-10.3-fat-2.5/_mysql.o cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) /usr/bin/ld: warning /usr/local/mysql/lib/libmysqlclient_r.dylib cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ sudo python setup.py install Password: running install ... ... Adding MySQL-python 1.2.2 to easy-install.pth file Installed /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg Processing dependencies for MySQL-python==1.2.2 LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import MySQLdb /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to sys.path import sys, pkg_resources, imp Traceback (most recent call last): File "", line 1, in File "MySQLdb/__init__.py", line 19, in import _mysql File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in __bootstrap__ ImportError: dlopen(/Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib Referenced from: /Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so Reason: image not found _________________________________________________________________ Climb to the top of the charts!? Play Star Shuffle:? the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct From biopython at maubp.freeserve.co.uk Thu Oct 18 06:22:22 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Oct 2007 11:22:22 +0100 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> Message-ID: <471733DE.6050803@maubp.freeserve.co.uk> Jimmy Musselwhite wrote: > Just kidding, it didn't work great. It only "fixed" it because I was > printing out the output of count() and so it was just executing 100 times > slower and thus eating RAM 100 times slower :( > > It doesn't seem like there is a good way for me to fix this. Both of these are using the python string method to count "GG", the only difference is the tostring() method has the additional small overhead of an extra function call: my_seq.data.count("GG") my_seq.tostring().count("GG") However, comparing these: my_seq.data.count("G") # using python's string count method my_seq.tostring().count("G") # using python's string count method my_seq.count("G") # using an iterator internally It could be that the Seq record's current single letter search is simply very memory efficient compared than the python string's more flexible multi-letter search. How are you measuring the RAM? If like to see memory usage figures for the five simple examples above on a large sequence - plus doing this directly on the equivalent string. Are you using Linux or Windows or Mac OS, and what version of python? I know there have been some string optimisations in Python 2.5 (although I don't know if any are relevant to the count method). Peter From dalloliogm at gmail.com Fri Oct 19 09:38:50 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 19 Oct 2007 15:38:50 +0200 Subject: [BioPython] Question about Seq.count() In-Reply-To: <471686C7.6050305@maubp.freeserve.co.uk> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> Message-ID: <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com> 2007/10/18, Peter : > >>> from Bio.Seq import Seq > >>> my_seq = Seq("AAACACACGGTTTT") > >>> my_seq.count("G") > 2 > >>> my_seq.count("GG") > 0 I've found the bug! The code for Bio.Seq.count is: def count(self, item): return len([x for x in self.data if x == item]) it does not work for patterns of two nucleotides, because '[x for x in self.data]' reiterates on a list of strings of one letter each: >>> s = Seq( 'ACTTgGCATYCGgtGACGACTGGGcATCGGTCAGTCGGTTT') >>> [x for x in s.data] ['A', 'C', 'T', 'T', 'g', 'G', 'C', 'A', 'T', 'Y', 'C', 'G', 'g', 't', 'G', 'A', 'C', 'G', 'A', 'C', 'T', 'G', 'G', 'G', 'c', 'A', 'T', 'C', 'G', 'G', 'T', 'C', 'A', 'G', 'T', 'C', 'G', 'G', 'T', 'T', 'T'] >>> for x in s.data: >>> print x, 'GG', x == 'GG' (always false) Something like [len('GG' in s.data)] also won't work, because "'GG' in s.data" returns a Boolean value: >>> 'GG' in s.data True What about using regular expressions instead? >>> import re >>> r = re.compile('GG') >>> count = len(r.findall(my_seq.data)) They don't seem to be too different as for the execution time: # for i in $( seq 10); do time python -m re -c '"cdasd".count("cc")'; done 2>&1| grep real real 0m0.091s real 0m0.106s real 0m0.081s real 0m0.110s real 0m0.076s real 0m0.109s real 0m0.109s real 0m0.062s real 0m0.110s real 0m0.062s # for i in $(seq 10); do time python -m re -c 'len(re.findall("cc", "cdasd"))'; done 2>&1|grep real real 0m0.065s real 0m0.108s real 0m0.079s real 0m0.082s real 0m0.111s real 0m0.113s real 0m0.110s real 0m0.112s real 0m0.112s real 0m0.111s Compiling a short pattern with the re module shouldn't take too much time and maybe in future implementations, it will allows us to do more interesting things: for example, we will be able to add an 'ignorecase' parameter to Seq.count: >>> Bio.Seq('ACAGtcAGgCATGCGG').count('GG', 'ignorecase') 2 >>> Bio.Seq('ACAGtcAGgCATGCGG').count('GG') 1 What do you think? Cheers, Giovanni -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From biopython at maubp.freeserve.co.uk Fri Oct 19 10:50:56 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 19 Oct 2007 15:50:56 +0100 Subject: [BioPython] Question about Seq.count() In-Reply-To: <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com> Message-ID: <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com> > I've found the bug! > > The code for Bio.Seq.count is: > > def count(self, item): > return len([x for x in self.data if x == item]) Yeah - by design this (and the functionally similar version for the MutableSeq) both expect the count argument to be a single letter. The simple fix for the Seq object is to use the string method internally: def count(self, item): return self.data.count(item) For the MutableSeq things are not so straight forward, but supporting multiple character arguments can be done. > What about using regular expressions instead? > ... > What do you think? I think the Seq object's count method should act just like a normal python string's count method. If anyone wants to get fancy with regular expressions, they can do so. Peter From anaryin at gmail.com Mon Oct 22 08:21:49 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Mon, 22 Oct 2007 13:21:49 +0100 Subject: [BioPython] Scripts cannot connect Message-ID: Hello all! I solved my problem a few weeks ago on Windows but now that I've changed to Linux, it is back again. I have this script: #!/usr/bin/env python from SOAPpy import WSDL wsdl = 'http://soap.genome.jp/KEGG.wsdl' serv = WSDL.Proxy(wsdl) genes = ["eco:b1002", "eco:b2388"] results = serv.mark_pathway_by_objects("path:eco00010", genes) print results Everytime I try to run it, it gets me a timeout. I solved the problem in Windows by setting up env_variables. Here, the bash can access the web (it has its env_var http_proxy set) but my scripts can't.. any help? Thanks in advance! Jo?o Rodrigues From biopython at maubp.freeserve.co.uk Mon Oct 22 08:48:52 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Oct 2007 13:48:52 +0100 Subject: [BioPython] Scripts cannot connect In-Reply-To: References: Message-ID: <471C9C34.7000006@maubp.freeserve.co.uk> Jo?o Rodrigues wrote: > Everytime I try to run it, it gets me a timeout. I solved the problem in > Windows by setting up env_variables. Here, the bash can access the web (it > has its env_var http_proxy set) but my scripts can't.. any help? What does this do if you add it to your script? import os print os.environ.keys() try : print os.environ["http_proxy"] except KeyError : print "http_proxy environment variable not setup" How have you setup the environment variables in Linux? Via your .bashrc file? Peter From anaryin at gmail.com Mon Oct 22 09:11:46 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Mon, 22 Oct 2007 14:11:46 +0100 Subject: [BioPython] Scripts cannot connect In-Reply-To: <471C9C34.7000006@maubp.freeserve.co.uk> References: <471C9C34.7000006@maubp.freeserve.co.uk> Message-ID: Hello again! It says that the proxy isn't set.. I've added the line to my .bashrc ( I had to create it). Yet, it doesn't work. What am I doing wrong? (or not doing) From tiagoantao at gmail.com Mon Oct 22 10:01:53 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Mon, 22 Oct 2007 15:01:53 +0100 Subject: [BioPython] Scripts cannot connect In-Reply-To: References: <471C9C34.7000006@maubp.freeserve.co.uk> Message-ID: <471CAD51.101@gmail.com> Jo?o Rodrigues wrote: > It says that the proxy isn't set.. I've added the line to my .bashrc ( I had > to create it). Yet, it doesn't work. > > What am I doing wrong? (or not doing) Are you doing an export of the variable? Try doing env at the prompt and check if http_proxy is defined (you will get a big list of environment variables, just search or grep for the proxy one). Like: $ env | grep http_proxy On another front, your .bash_profile should exist and be sourcing .bashrc (either that, or you put http_proxy on .bash_profile) Regards, Tiago -- tiagoantao at gmail.com http://tiago.org/ps From anaryin at gmail.com Mon Oct 22 11:38:19 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Mon, 22 Oct 2007 16:38:19 +0100 Subject: [BioPython] Scripts cannot connect In-Reply-To: <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> References: <471C9C34.7000006@maubp.freeserve.co.uk> <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> Message-ID: Well, the problem is another then.. I've set the environment variables by hand and it worked. It detects the proxy and works through it. However, it still doesn't connect to the web. I'm using the example they gave on the KEGG API reference manual so it *should* work.. I've used a test script to check if other scripts could connect and they do. I've tried with the urllib to retrieve the kegg page and it does. I guess the problem is with the webservice... I'll try to figure it out. Thanks for your help! (Again :) ) From bsantos at biocant.pt Tue Oct 23 11:57:58 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 23 Oct 2007 16:57:58 +0100 Subject: [BioPython] Problems with NCBIXML.py Message-ID: <001101c8158d$7d146600$2300a8c0@bsantos> I am trying to build a simple script that given a multi FASTA sequence file perform a web BLAST and replace the name of the sequence by the hit with the lowest E-Value. But now I?m getting an exception that I don?t now why it?s happening: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 16, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in parse expat_parser.Parse(text, False) ExpatError: mismatched tag: line 2823, column 362 And where is my script: from Bio import SeqIO from Bio.Blast import NCBIWWW import cStringIO from Bio.Blast import NCBIXML #for file in dir file_handle = open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open file to an handler records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq Object save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w") for record in records: sequence = record.seq.data #Converts record to Plain Text result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a Blastn against the database nr blast_results = result_handle.read() #Catch the results save_file.write(blast_results) #Write all the information to an XML file result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml') blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: alignment = blast_record.alignments nIdent = (alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10 0.0 if nIdent >= 97: record.name = alignment[0].hit_def for record in records: print('>description_%s length_%d\n' % (record.name, len(record.seq))) print('%s\n' % record.seq) save_file.close() file_handle.close() Thank you, Bruno Santos From bsantos at biocant.pt Tue Oct 23 11:50:16 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 23 Oct 2007 16:50:16 +0100 Subject: [BioPython] Problems with NCBIXML.py Message-ID: <000c01c8158c$69ee0370$2300a8c0@bsantos> I am trying to build a simple script that given a multi FASTA sequence file perform a web BLAST and replace the name of the sequence by the hit with the lowest E-Value. But now I?m getting an exception that I don?t now why it?s happening: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 16, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in parse expat_parser.Parse(text, False) ExpatError: mismatched tag: line 2823, column 362 And where is my script: from Bio import SeqIO from Bio.Blast import NCBIWWW import cStringIO from Bio.Blast import NCBIXML #for file in dir file_handle = open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open file to an handler records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq Object save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w") for record in records: sequence = record.seq.data #Converts record to Plain Text result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a Blastn against the database nr blast_results = result_handle.read() #Catch the results save_file.write(blast_results) #Write all the information to an XML file result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml') blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: alignment = blast_record.alignments nIdent = (alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10 0.0 if nIdent >= 97: record.name = alignment[0].hit_def for record in records: print('>description_%s length_%d\n' % (record.name, len(record.seq))) print('%s\n' % record.seq) save_file.close() file_handle.close() Thank you, Bruno Santos From bsantos at biocant.pt Tue Oct 23 11:59:50 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 23 Oct 2007 16:59:50 +0100 Subject: [BioPython] Problems with NCBIXML.py Message-ID: <001601c8158d$bff07cc0$2300a8c0@bsantos> I am trying to build a simple script that given a multi FASTA sequence file perform a web BLAST and replace the name of the sequence by the hit with the lowest E-Value. But now I?m getting an exception that I don?t now why it?s happening: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 16, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in parse expat_parser.Parse(text, False) ExpatError: mismatched tag: line 2823, column 362 And where is my script: from Bio import SeqIO from Bio.Blast import NCBIWWW import cStringIO from Bio.Blast import NCBIXML #for file in dir file_handle = open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open file to an handler records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq Object save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w") for record in records: sequence = record.seq.data #Converts record to Plain Text result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a Blastn against the database nr blast_results = result_handle.read() #Catch the results save_file.write(blast_results) #Write all the information to an XML file result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml') blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: alignment = blast_record.alignments nIdent = (alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10 0.0 if nIdent >= 97: record.name = alignment[0].hit_def for record in records: print('>description_%s length_%d\n' % (record.name, len(record.seq))) print('%s\n' % record.seq) save_file.close() file_handle.close() Thank you, Bruno Santos From bsantos at biocant.pt Tue Oct 23 13:17:24 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 23 Oct 2007 18:17:24 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <471E1CBC.30601@maubp.freeserve.co.uk> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> Message-ID: <001b01c81598$95f7b3b0$2300a8c0@bsantos> I have manually checked the file and I didn't found any problem. Sorry about the three times it was my mistake because I send the message before register and then I thought I had to send it again. This is getting stranger every time I ran the script it gave me a different error. Now I get this one at the first run: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 17, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in parse expat_parser.Parse("", True) # End of XML record ExpatError: unclosed token: line 2826, column 8 Now if I run the script without first close it I get the following error: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 17, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in parse expat_parser.Parse("", True) # End of XML record ExpatError: no element found: line 2823, column 81 Now if I execute the close operation on both files in the interactive window and run the script again I get: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 17, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in parse expat_parser.Parse("", True) # End of XML record ExpatError: no element found: line 2827, column 0 I have upload my script, the FASTA file I'm using and the XML can anyone give a look? XML File: http://www.drivehq.com/folder/p2731454.aspx Script: http://www.drivehq.com/folder/p2731447.aspx FASTA File: http://www.drivehq.com/folder/p2731426.aspx Unidade de Bioinform?tica 3060-197 Cantanhede Tel: 231 410 892 http://bioinformatics.biocant.pt -----Mensagem original----- De: Peter [mailto:biopython at maubp.freeserve.co.uk] Enviada: ter?a-feira, 23 de Outubro de 2007 17:10 Para: Bruno Santos Cc: biopython at biopython.org Assunto: Re: [BioPython] Problems with NCBIXML.py Bruno Santos wrote: > I am trying to build a simple script that given a multi FASTA sequence file > perform a web BLAST and replace the name of the sequence by the hit with the > lowest E-Value. > > But now I?m getting an exception that I don?t now why it?s happening: > > Traceback (most recent call last): > ... > > for blast_record in blast_records: > > File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in > parse > > expat_parser.Parse(text, False) > > ExpatError: mismatched tag: line 2823, column 362 That sounds like an error in the XML file - have a look at this particular XML file by hand in a text editor; maybe its only a partial download, or an HTML error page or something. Peter From biopython at maubp.freeserve.co.uk Tue Oct 23 14:14:43 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Oct 2007 19:14:43 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <001b01c81598$95f7b3b0$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> Message-ID: <471E3A13.5080505@maubp.freeserve.co.uk> Bruno Santos wrote: > I have manually checked the file and I didn't found any problem. > Sorry about the three times it was my mistake because I send the message > before register and then I thought I had to send it again. > This is getting stranger every time I ran the script it gave me a different > error. Now I get this one at the first run: > > ... > > Now if I run the script without first close it I get the following error: > Traceback (most recent call last): > Without seeing the XML file I'm having to guess - but this could be something to do with trying to read files from disk before the OS has finished flushing the data out. Mismatched tags could certainly be explained if the parser was only getting part of the data. You could try inserting a sleep of a few seconds after writing and closing the XML file. Also try handle.flush() before the handle.close() when you save the XML file to disk. > I have upload my script, the FASTA file I'm using and the XML can anyone > give a look? > > XML File: http://www.drivehq.com/folder/p2731454.aspx > Script: http://www.drivehq.com/folder/p2731447.aspx > FASTA File: http://www.drivehq.com/folder/p2731426.aspx That didn't work - the easy solution is to file a bug, and then attach the three files: http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython Peter From dag23 at duke.edu Tue Oct 23 17:06:53 2007 From: dag23 at duke.edu (David Garfield) Date: Tue, 23 Oct 2007 17:06:53 -0400 Subject: [BioPython] Syntax error while parsing Blast output Message-ID: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> Hey list, I'm having an issue with the BlastParser and Iterator from NCBIStandalone. I assume its because NCBI has gone and changed the output file (again)...or I'm an idiot....but maybe there's a real problem here. I'm trying to parse a blast result using the following code: def filter_blast_results(blast_results, blast_cut_off): b_parser = NCBIStandalone.BlastParser() b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) hit_results = {} while 1: b_record = b_iterator.next() if b_record is None: break header = b_record.Header.query temp = [] for alignment in b_record.alignments: for hsp in alignment.hsps: if hsp.expect < blast_cut_off: temp.append(alignment.title) #we now remove duplicates from the temp list and add that the the hit_results hit_results[header] = remove_duplicates(temp) return hit_results And I get the error I've included at the bottom of this message, something about "SyntaxError: Line does not start with 'Reference':" I know that blast is working because I can print out what appears to my untrained eye to be a perfectly good XML of the results I see when I run blast manually. Any help would be very much appreciated, David Traceback (most recent call last): File "test_scripts.py", line 7, in single_blast_sequence.run_2way_blast('single_test_in.fasta','/ Users/dagarfield/urchins/blastdbs/urchin_2.0','/Users/dagarfield/ urchins/blastdbs/urchin_2.0','NA',.001,'/Users/dagarfield/urchins/ urchin_bin/blastall') File "/private/var/automount/Network/Share2/genomeScans/urchins/ alignment_methods/blast/single_blast_sequence.py", line 57, in run_2way_blast input_to_other_blast_matches = filter_blast_results (blast_results, blast_cut_off) File "/private/var/automount/Network/Share2/genomeScans/urchins/ alignment_methods/blast/single_blast_sequence.py", line 39, in filter_blast_results b_record = b_iterator.next() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1403, in next return self._parser.parse(File.StringHandle(data)) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 616, in parse self._scanner.feed(handle, self._consumer) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 96, in feed self._scan_header(uhandle, consumer) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 125, in _scan_header read_and_call(uhandle, consumer.reference, start='Reference') File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/ParserSupport.py", line 300, in read_and_call raise SyntaxError, errmsg SyntaxError: Line does not start with 'Reference': /Users/dagarfield/urchins/blastdbs/urchin_2.0 From biopython at maubp.freeserve.co.uk Tue Oct 23 17:45:38 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Oct 2007 22:45:38 +0100 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> Message-ID: <471E6B82.5010700@maubp.freeserve.co.uk> David Garfield wrote: > Hey list, > > I'm having an issue with the BlastParser and Iterator from > NCBIStandalone. I assume its because NCBI has gone and changed the > output file (again)...or I'm an idiot....but maybe there's a real > problem here. The code you gave uses the NCBIStandalone parser/iterator, which expects plain text output - yet you say later the raw file looks like a perfectly good XML file. If you have an XML file (which we recommend over the plain text) then you should use the NCBIXML module instead. Also, a style point - I personally much prefer this: b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) for b_record in b_iterator : #etc over this: b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) while 1: b_record = b_iterator.next() if b_record is None: break #etc Peter From dag23 at duke.edu Tue Oct 23 17:59:33 2007 From: dag23 at duke.edu (David Garfield) Date: Tue, 23 Oct 2007 17:59:33 -0400 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <471E6B82.5010700@maubp.freeserve.co.uk> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> Message-ID: Thanks, Peter. You've found the problem exactly. Interestingly, the code I presented was taken directly from the BioPython cookbook (including the "while 1" bit). Somewhere in the subsequent versions since that document was released, the output of NCBIStandalone has changed from text to XML and the NCBIStandalone Iterators and Parser either no longer seem to work with the output of NCBIStandalone.blastall or there is an option not mentioned in the Cookbook to ensure that the output is in text rather than XML. In any event, the problem is now fixed. Thanks! --DG On Oct 23, 2007, at 5:45 PM, Peter wrote: > David Garfield wrote: >> Hey list, >> I'm having an issue with the BlastParser and Iterator from >> NCBIStandalone. I assume its because NCBI has gone and changed >> the output file (again)...or I'm an idiot....but maybe there's a >> real problem here. > > The code you gave uses the NCBIStandalone parser/iterator, which > expects plain text output - yet you say later the raw file looks > like a perfectly good XML file. If you have an XML file (which we > recommend over the plain text) then you should use the NCBIXML > module instead. > > Also, a style point - I personally much prefer this: > > b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) > for b_record in b_iterator : > #etc > > over this: > > b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) > while 1: > b_record = b_iterator.next() > if b_record is None: break > #etc > > Peter > From biopython at maubp.freeserve.co.uk Tue Oct 23 18:48:28 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Oct 2007 23:48:28 +0100 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> Message-ID: <471E7A3C.5010301@maubp.freeserve.co.uk> David Garfield wrote: > Thanks, Peter. You've found the problem exactly. > > Interestingly, the code I presented was taken directly from the > BioPython cookbook (including the "while 1" bit). So it is. Michiel - do you fancy tweaking that section of the tutorial? > Somewhere in the subsequent versions since that document was released, > the output of NCBIStandalone has changed from text to XML and the > NCBIStandalone Iterators and Parser either no longer seem to work with > the output of NCBIStandalone.blastall or there is an option not > mentioned in the Cookbook to ensure that the output is in text rather > than XML. Biopython 1.43 switched the default from text to XML, because we really wanted to encourage people to use the XML output by default as maintaining the text format parser is such an ongoing maintainance effort. The release notes did mention this, but it was bound to catch someone out. There is an option to override this... from Bio.Blast import NCBIStandalone help(NCBIStandalone.blastall) You need the align_view option (what the NCBI refers to as the alignment view), corresponding to the -m command line option of the NCBI blastall tool. Biopython currently defaults to seven to get XML output. alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = query-anchored no identities and blunt ends, 6 = flat query-anchored, no identities and blunt ends, 7 = XML Blast output, 8 = tabular, 9 tabular with comment lines 10 ASN, text 11 ASN, binary [Integer] Peter From biopython at maubp.freeserve.co.uk Tue Oct 23 12:09:32 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Oct 2007 17:09:32 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <001101c8158d$7d146600$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> Message-ID: <471E1CBC.30601@maubp.freeserve.co.uk> Bruno Santos wrote: > I am trying to build a simple script that given a multi FASTA sequence file > perform a web BLAST and replace the name of the sequence by the hit with the > lowest E-Value. > > But now I?m getting an exception that I don?t now why it?s happening: > > Traceback (most recent call last): > ... > > for blast_record in blast_records: > > File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in > parse > > expat_parser.Parse(text, False) > > ExpatError: mismatched tag: line 2823, column 362 That sounds like an error in the XML file - have a look at this particular XML file by hand in a text editor; maybe its only a partial download, or an HTML error page or something. Peter From mdehoon at c2b2.columbia.edu Tue Oct 23 20:19:47 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 23 Oct 2007 20:19:47 -0400 Subject: [BioPython] Syntax error while parsing Blast output References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> > > Interestingly, the code I presented was taken directly from the > > BioPython cookbook (including the "while 1" bit). > > So it is. Michiel - do you fancy tweaking that section of the tutorial? That part of the tutorial is in the section "Deprecated BLAST parsers", which will be removed once the plain-text Blast parser is removed from Biopython. The description of NCBIStandalone.blastall says "This command will generate BLAST output in XML format, ..." So this is being described correctly in the documentation. Nevertheless, it may be a good idea to remove the plain text Blast parser completely from Biopython in the upcoming release (which will probably be done this week), to avoid further confusion. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Peter Sent: Tue 10/23/2007 6:48 PM To: David Garfield; biopython at lists.open-bio.org Subject: Re: [BioPython] Syntax error while parsing Blast output David Garfield wrote: > Thanks, Peter. You've found the problem exactly. > > Somewhere in the subsequent versions since that document was released, > the output of NCBIStandalone has changed from text to XML and the > NCBIStandalone Iterators and Parser either no longer seem to work with > the output of NCBIStandalone.blastall or there is an option not > mentioned in the Cookbook to ensure that the output is in text rather > than XML. Biopython 1.43 switched the default from text to XML, because we really wanted to encourage people to use the XML output by default as maintaining the text format parser is such an ongoing maintainance effort. The release notes did mention this, but it was bound to catch someone out. There is an option to override this... from Bio.Blast import NCBIStandalone help(NCBIStandalone.blastall) You need the align_view option (what the NCBI refers to as the alignment view), corresponding to the -m command line option of the NCBI blastall tool. Biopython currently defaults to seven to get XML output. alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = query-anchored no identities and blunt ends, 6 = flat query-anchored, no identities and blunt ends, 7 = XML Blast output, 8 = tabular, 9 tabular with comment lines 10 ASN, text 11 ASN, binary [Integer] Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Wed Oct 24 04:22:45 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Oct 2007 09:22:45 +0100 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> [Sorry you got this twice Michiel, I forgot to set the from/to fields] > That part of the tutorial is in the section "Deprecated BLAST parsers", which > will be removed once the plain-text Blast parser is removed from Biopython. > ... > Nevertheless, it may be a good idea to remove the plain text Blast parser > completely from Biopython in the upcoming release (which will probably be > done this week), to avoid further confusion. Removing it sounds too drastic - especially as we have had people on the mailing list using it deliberately fairly recently. If you really do want to remove this code, then adding a deprecation warning to the plain text parser for the next release would be a more gentle route. I think there is still some benefit in having the plain text parser, and that it could be fixed to cope with current multi-query files without too much pain. Maybe I should try this weekend... Anyone want to voice their opinion? Peter From mmokrejs at ribosome.natur.cuni.cz Wed Oct 24 07:01:26 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Wed, 24 Oct 2007 13:01:26 +0200 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> Message-ID: <471F2606.8080500@ribosome.natur.cuni.cz> Hi, Michiel De Hoon wrote: >>> Interestingly, the code I presented was taken directly from the >>> BioPython cookbook (including the "while 1" bit). >> So it is. Michiel - do you fancy tweaking that section of the tutorial? > > That part of the tutorial is in the section "Deprecated BLAST parsers", which > will be removed once the plain-text Blast parser is removed from Biopython. > The description of NCBIStandalone.blastall says > > "This command will generate BLAST output in XML format, ..." > > So this is being described correctly in the documentation. > > Nevertheless, it may be a good idea to remove the plain text Blast parser > completely from Biopython in the upcoming release (which will probably be > done this week), to avoid further confusion. although I understand your points, are you sure to REMOVE it? What if people need to parse elsewhere generated, maybe even in the past generated BLAST text outputs? If you wanted to say that you will REMOVE the text-based parser because it won't be maintained anymore and probably be usable for one or two NCBI BLAST version only, then it is probably more understandable. Otherwise I guess more people move to bioperl. ;) BTW, what if some people have older BLAST version generating broken XML file formats? Or have to parse such old files again? Martin From winter at biotec.tu-dresden.de Wed Oct 24 08:22:09 2007 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 24 Oct 2007 14:22:09 +0200 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> Message-ID: <471F38F1.1030600@biotec.tu-dresden.de> Michiel De Hoon wrote: > Nevertheless, it may be a good idea to remove the plain text Blast parser > completely from Biopython in the upcoming release (which will probably be > done this week), to avoid further confusion. I agree with Peter and Martin that removing the plain text parser is maybe too much. Although I further agree that there is benefit in having the plain text parser, I am not sure if Biopython should ensure supporting every small format change that NCBI might come up with in the future. I use XML and tabular output only, BTW. Cheers, Christof From cjfields at uiuc.edu Wed Oct 24 09:49:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 24 Oct 2007 08:49:09 -0500 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> Message-ID: <3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu> On Oct 24, 2007, at 3:22 AM, Peter wrote: > [Sorry you got this twice Michiel, I forgot to set the from/to fields] > >> That part of the tutorial is in the section "Deprecated BLAST >> parsers", which >> will be removed once the plain-text Blast parser is removed from >> Biopython. >> ... >> Nevertheless, it may be a good idea to remove the plain text Blast >> parser >> completely from Biopython in the upcoming release (which will >> probably be >> done this week), to avoid further confusion. > > Removing it sounds too drastic - especially as we have had people on > the mailing list using it deliberately fairly recently. If you > really do want > to remove this code, then adding a deprecation warning to the plain > text > parser for the next release would be a more gentle route. > > I think there is still some benefit in having the plain text > parser, and that > it could be fixed to cope with current multi-query files without > too much > pain. Maybe I should try this weekend... > > Anyone want to voice their opinion? > > Peter We have a similar issue with the bioperl parsers. We basically promote the BLAST XML parser over the text parser, but we have retained both due to demand. In fact, we have two text parsers, a pull and a push parser (we're gluttons for punishment). As for maintenance, we never guarantee how long it will take to fix text parsing if it breaks as the text format is fairly unstable by NCBI's own admission. Our deprecation cycle is usually: (1) announce it on list to get feedback, (2) if deprecation is planned, add warnings to the module in the next release, (3) remove completely in a later release. It gives everyone time to change over. chris From bsantos at biocant.pt Wed Oct 24 12:23:56 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 24 Oct 2007 17:23:56 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <471E3A13.5080505@maubp.freeserve.co.uk> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> Message-ID: <001601c8165a$48248600$2300a8c0@bsantos> Peter Wrote: >Without seeing the XML file I'm having to guess - but this could be >something to do with trying to read files from disk before the OS has >finished flushing the data out. Mismatched tags could certainly be >explained if the parser was only getting part of the data. > >You could try inserting a sleep of a few seconds after writing and >closing the XML file. Also try handle.flush() before the handle.close() >when you save the XML file to disk. You were right I was getting the data before it has been written to the file. Now it's working perfect. But know I have another problem it's possible to instead of making a single request to NCBI_Blast with one sequence, make the request for all the sequences in a multiFASTA file? I'm trying to use threads to do this but until now without luck. Thanks in advance, Bruno Santos From biopython at maubp.freeserve.co.uk Wed Oct 24 13:32:52 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Oct 2007 18:32:52 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <001601c8165a$48248600$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> Message-ID: <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> On 10/24/07, Bruno Santos wrote: > You were right I was getting the data before it has been written to the > file. Now it's working perfect. Great. > But know I have another problem it's possible to instead of making a single > request to NCBI_Blast with one sequence, make the request for all the > sequences in a multiFASTA file? > > I'm trying to use threads to do this but until now without luck. I would suggest you install standalone blast, then give it the multi-record FASTA file as input. You should then get multiple blast records back (in the same order). This works fine with the XML output (but currently does not work for plain text output on recent versions of NCBI Blast). If you really want to make multiple blast submissions in parallel online, first check the NCBI's website for any usage restrictions - they don't want their servers to be abused. Peter From biosql at hotmail.com Wed Oct 24 16:53:19 2007 From: biosql at hotmail.com (Jonathan Boulais) Date: Wed, 24 Oct 2007 16:53:19 -0400 Subject: [BioPython] Loading SwissProt to BioSQL Message-ID: Hello, I'm a biologist and quite newb with Biopython. I'm trying to build locally the Swissprot database with BioSQL and I'm having some problems. I have installed the latest version from the CVS and I'm using python 2.5 on a Mac Os 10.4. First, i get this weird problem. Since I need to connect with MySQL I started to wrote a simple script (Biosql.py) with only this ( from BioSQL import BioSeqDatabase). When I run this script in the terminal : python Biosql.py, I get this message **ImportError: cannot import name BioSeqDatabase**. But the weird thing is if I start a python session in the terminal by simply invoking python and then manually import BioSeqDatabase, it's working ! Is there any reason for that ? Second, I've then decided to continue with the python session since I'm able to import BioSeqDatabse. The connection to MySQL is working fine, but when I'm trying to import the flat file I'm getting this : Traceback (most recent call last): File "", line 1, in File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load db_loader.load_seqrecord(cur_record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table version)) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute self.cursor.execute(sql, args or ()) File "/sw/lib/python2.5/site-packages/MySQLdb/cursors.py", line 151, in execute query = query % db.literal(args) TypeError: not all arguments converted during string formatting Here's the lines I'm using : from BioSQL import BioSeqDatabase from Bio.SwissProt import SProt server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "", passwd = "", host = "localhost", db = "bioseqdb") s_parser = SProt.SequenceParser() s_iterator = SProt.Iterator(open("path to/uniprot_sprot.dat", "r"), s_parser) db = server.new_database("Swiss") db.load(s_iterator) Does anybody understand this ? Many thanks if someone can help ! Jonathan _________________________________________________________________ Envoie un sourire, fais rire, amuse-toi! Employez-le maintenant! http://www.emoticonesgratuites.ca/?icid=EMFRCA120 From biopython at maubp.freeserve.co.uk Wed Oct 24 17:15:10 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Oct 2007 22:15:10 +0100 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: References: Message-ID: <471FB5DE.6080506@maubp.freeserve.co.uk> Jonathan Boulais wrote: > Hello, > > I'm a biologist and quite newb with Biopython. I'm trying to build > locally the Swissprot database with BioSQL and I'm having some > problems. I have installed the latest version from the CVS and I'm > using python 2.5 on a Mac Os 10.4. > > First, i get this weird problem. Since I need to connect with MySQL I > started to wrote a simple script (Biosql.py) with only this ( from > BioSQL import BioSeqDatabase). When I run this script in the > terminal: python Biosql.py, I get this message **ImportError: cannot > import name BioSeqDatabase**. But the weird thing is if I start a > python session in the terminal by simply invoking python and then > manually import BioSeqDatabase, it's working ! Is there any reason > for that ? In both cases are you running python from the command prompt? If so then the same environment variables (e.g. paths) should apply. Odd. My guess is you shouldn't call your script "Biosql.py", call it "Biosql_test.py" or something. Python thinks the line "from BioSQL import BioSeqDatabase" means importing from the script itself because that is also called BioSQL. Peter From biopython at maubp.freeserve.co.uk Wed Oct 24 17:22:05 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Oct 2007 22:22:05 +0100 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: References: Message-ID: <471FB77D.5060103@maubp.freeserve.co.uk> Jonathan Boulais wrote: > from Bio.SwissProt import SProt > s_parser = SProt.SequenceParser() > s_iterator = SProt.Iterator(open("path to/uniprot_sprot.dat", "r"), s_parser) This won't help with the database issue, but you should also be able to load the SwissProt text file with Bio.SeqIO: from Bio import SeqIO s_iterator = SeqIO.parse(open("path/to/uniprot_sprot.dat"), "swiss") This in fact will call the Bio.SwissProt.SProt module internally, and get it to return SeqRecord objects. The Bio.SeqIO interface is meant to make it easy to switch the input file format (e.g. GenBank or EMBL). Peter From mdehoon at c2b2.columbia.edu Wed Oct 24 20:40:18 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 24 Oct 2007 20:40:18 -0400 Subject: [BioPython] Syntax error while parsing Blast output References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> <3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu> >> Nevertheless, it may be a good idea to remove the plain text Blast >> parser >> completely from Biopython in the upcoming release (which will >> probably be >> done this week), to avoid further confusion. > > Removing it sounds too drastic - especially as we have had people on > the mailing list using it deliberately fairly recently. If you > really do want > to remove this code, then adding a deprecation warning to the plain > text > parser for the next release would be a more gentle route. > Sorry, I was confused; I was under the impression that the plain text Blast parser was already deprecated (I was getting confused with the blast and blasturl functions in Bio.Blast.NCBIWWW, which are already deprecated in favor of qblast). OK, then let's keep the plain-text Blast parser as is, and maybe think again about this issue after the upcoming release. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mmayhew at mcb.mcgill.ca Thu Oct 25 00:12:06 2007 From: mmayhew at mcb.mcgill.ca (Michael Mayhew) Date: Thu, 25 Oct 2007 00:12:06 -0400 Subject: [BioPython] Any planned BioPython presence at PyCon 2008? Message-ID: <47201796.2050902@mcb.mcgill.ca> Was planning on going to PyCon 2008 anyway, but would have even more incentive if there is going to be a big BioPython community turnout. Would love to pitch in on a development session or something like that. Michael Mayhew From biosql at hotmail.com Thu Oct 25 10:52:02 2007 From: biosql at hotmail.com (Jonathan Boulais) Date: Thu, 25 Oct 2007 10:52:02 -0400 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: <471FB5DE.6080506@maubp.freeserve.co.uk> References: <471FB5DE.6080506@maubp.freeserve.co.uk> Message-ID: > Date: Wed, 24 Oct 2007 22:15:10 +0100 > From: biopython at maubp.freeserve.co.uk > To: biosql at hotmail.com; biopython at lists.open-bio.org > Subject: Re: [BioPython] Loading SwissProt to BioSQL > > Jonathan Boulais wrote: > > Hello, > > > > I'm a biologist and quite newb with Biopython. I'm trying to build > > locally the Swissprot database with BioSQL and I'm having some > > problems. I have installed the latest version from the CVS and I'm > > using python 2.5 on a Mac Os 10.4. > > > > First, i get this weird problem. Since I need to connect with MySQL I > > started to wrote a simple script (Biosql.py) with only this ( from > > BioSQL import BioSeqDatabase). When I run this script in the > > terminal: python Biosql.py, I get this message **ImportError: cannot > > import name BioSeqDatabase**. But the weird thing is if I start a > > python session in the terminal by simply invoking python and then > > manually import BioSeqDatabase, it's working ! Is there any reason > > for that ? > > In both cases are you running python from the command prompt? If so > then the same environment variables (e.g. paths) should apply. Odd. > > My guess is you shouldn't call your script "Biosql.py", call it > "Biosql_test.py" or something. Python thinks the line "from BioSQL > import BioSeqDatabase" means importing from the script itself because > that is also called BioSQL. > > Peter > Peter you were right about the name of the file. Nice call and thank you ! But I still get the same error as before when I'm running it. Traceback (most recent call last): File "DB.py", line 14, in db.load(s_iterator) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load db_loader.load_seqrecord(cur_record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table version)) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute self.cursor.execute(sql, args or ()) File "/sw/lib/python2.5/site-packages/MySQLdb/cursors.py", line 151, in execute query = query % db.literal(args) TypeError: not all arguments converted during string formatting Is it the MySQLdb driver or a bad arguments that is passed to MySQLdb ? Again, thank you for your time. Jonathan _________________________________________________________________ Envoie un sourire, fais rire, amuse-toi! Employez-le maintenant! http://www.emoticonesgratuites.ca/?icid=EMFRCA120 From biopython at maubp.freeserve.co.uk Thu Oct 25 13:22:46 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Oct 2007 18:22:46 +0100 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: References: <471FB5DE.6080506@maubp.freeserve.co.uk> Message-ID: <4720D0E6.8000609@maubp.freeserve.co.uk> Jonathan Boulais wrote: >> My guess is you shouldn't call your script "Biosql.py", call it >> "Biosql_test.py" or something. Python thinks the line "from BioSQL >> import BioSeqDatabase" means importing from the script itself because >> that is also called BioSQL. > > Peter you were right about the name of the file. Nice call and thank you ! Great - I wasn't sure if the case would matter or not. > But I still get the same error as before when I'm running it. > ... I've not used BioSQL myself (yet), but looking at the code you posted earlier, you setup the connection like this: from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="MySQLdb", user="", passwd="", host="localhost", db="bioseqdb") I think the driver="MySQLdb" is fine, but don't you need a database username (and perhaps a password)? Peter From biopython at maubp.freeserve.co.uk Thu Oct 25 05:44:43 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Oct 2007 10:44:43 +0100 Subject: [BioPython] Any planned BioPython presence at PyCon 2008? In-Reply-To: <47201796.2050902@mcb.mcgill.ca> References: <47201796.2050902@mcb.mcgill.ca> Message-ID: <4720658B.4020103@maubp.freeserve.co.uk> Michael Mayhew wrote: > Was planning on going to PyCon 2008 anyway, but would have even more > incentive if there is going to be a big BioPython community turnout. > > Would love to pitch in on a development session or something like that. > > Michael Mayhew http://us.pycon.org/2008/about/ http://pycon.blogspot.com/2007/10/call-for-talk-tutorial-proposals.html > Proposals for PyCon 2008 talks & tutorials are now being accepted. > The deadline for proposals is November 16. PyCon 2008 will be held > in Chicago, Illinois, USA, from March 13-20. It is remotely possible that I'll be working the USA next year, but I have to say at this point that it looks unlikely that I'll be able to attend. Peter From biopython at maubp.freeserve.co.uk Thu Oct 25 05:57:10 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Oct 2007 10:57:10 +0100 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> <3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu> Message-ID: <47206876.9040905@maubp.freeserve.co.uk> Michiel De Hoon wrote: > > Sorry, I was confused; I was under the impression that the plain text Blast > parser was already deprecated (I was getting confused with the blast and > blasturl functions in Bio.Blast.NCBIWWW, which are already deprecated in > favor of qblast). OK, then let's keep the plain-text Blast parser as is, and > maybe think again about this issue after the upcoming release. > Panic averted - but it was good to hear some passionate defence of the plain text BLAST parser, it looks like it still gets quite a bit of use. Peter From bsantos at biocant.pt Fri Oct 26 05:13:58 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Fri, 26 Oct 2007 10:13:58 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> Message-ID: <000301c817b0$8c868c10$2300a8c0@bsantos> Peter Said >I would suggest you install standalone blast, then give it the >multi-record FASTA file as input. You should then get multiple blast >records back (in the same order). This works fine with the XML output >(but currently does not work for plain text output on recent versions >of NCBI Blast). > >If you really want to make multiple blast submissions in parallel >online, first check the NCBI's website for any usage restrictions - >they don't want their servers to be abused. > >Peter I have followed your advice and I decide to install standalone blast. As I want to make blast against the nt databases I have downloaded it pre compiled from the ncbi ftp server. And I have created I scrip to do this but for some reason I'm not getting any results, because the programs does not write anything to the XML file. Where is my script: from Bio import SeqIO from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML import time import math my_blast_db = (r'e:/nt.00') my_blast_file = r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna' my_blast_exe = r'C:/BLAST/bin/' save_file = open(r'C:/FASTASeq/Results/well9/V6_BLAST.xml', 'w') result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file) blast_results = result_handle.read() #Catch the results save_file.write(blast_results) #Write all the information to an XML file save_file.close() print time.ctime() As I have download the files from ncbi I have a lot of files in the database directory theres is any way of perform a search against all of them? Thanks in advance, Bruno Santos Unidade de Bioinform?tica 3060-197 Cantanhede Tel: 231 410 892 http://bioinformatics.biocant.pt From biopython at maubp.freeserve.co.uk Fri Oct 26 05:52:34 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Oct 2007 10:52:34 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <000301c817b0$8c868c10$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> Message-ID: <4721B8E2.2040902@maubp.freeserve.co.uk> Bruno Santos wrote: > Peter Said >> I would suggest you install standalone blast, then give it the >> multi-record FASTA file as input. You should then get multiple blast >> records back (in the same order). This works fine with the XML output >> (but currently does not work for plain text output on recent versions >> of NCBI Blast). >> >> If you really want to make multiple blast submissions in parallel >> online, first check the NCBI's website for any usage restrictions - >> they don't want their servers to be abused. >> >> Peter > > I have followed your advice and I decide to install standalone blast. As I > want to make blast against the nt databases I have downloaded it pre > compiled from the ncbi ftp server. And I have created I script to do this but > for some reason I'm not getting any results, because the programs does not > write anything to the XML file. > > Where is my script: > from Bio import SeqIO > from Bio.Blast import NCBIStandalone > from Bio.Blast import NCBIXML > import time > import math You are running on Windows, so the paths should have "\" rather than "/" in them. However, in many cases this isn't essential - and indeed for some Unix programs ported to Windows using "/" is sometimes best! > my_blast_db = (r'e:/nt.00') I'm not sure if that is correct, but its difficult to tell without seeing your setup. > my_blast_file = > r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna' > my_blast_exe = r'C:/BLAST/bin/' That is wrong, try something like: my_blast_exe = r'C:\BLAST\bin\blastall.exe' I would urge you to try running blastall "by hand" at the command line first for a few small examples, to get the hang of it. Because any error messages get printed to the command line, it makes debugging simpler. This will also help with you how to prepare the arguments in Biopython. Within python you would have to have checked what was written to the error_info output handle. > As I have download the files from ncbi I have a lot of files in the database > directory theres is any way of perform a search against all of them? I'm not sure what exactly you are asking. BLAST can make databases from FASTA files, so you might want to build a database from all your FASTA files... check the documentation for the BLAST formatdb program. Peter From bsantos at biocant.pt Fri Oct 26 09:40:40 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Fri, 26 Oct 2007 14:40:40 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <4721B8E2.2040902@maubp.freeserve.co.uk> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> <4721B8E2.2040902@maubp.freeserve.co.uk> Message-ID: <000701c817d5$d0e8f4e0$2300a8c0@bsantos> >You are running on Windows, so the paths should have "\" rather than "/" >in them. However, in many cases this isn't essential - and indeed for >some Unix programs ported to Windows using "/" is sometimes best! > > my_blast_db = (r'e:/nt.00') > >I'm not sure if that is correct, but its difficult to tell without >seeing your setup. It's ok to use the "/" because it seems that the python interpreter converts it to the symbol used by the OS. > my_blast_file = > r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna' > my_blast_exe = r'C:/BLAST/bin/' > >That is wrong, try something like: >my_blast_exe = r'C:\BLAST\bin\blastall.exe' You were right about that. It's ok now > As I have download the files from ncbi I have a lot of files in the database > directory theres is any way of perform a search against all of them? >I'm not sure what exactly you are asking. BLAST can make databases from >FASTA files, so you might want to build a database from all your FASTA >files... check the documentation for the BLAST formatdb program. I have downloaded the pre compiled files which mean I have five different files like (nt.00.nhr, nt.01.nhr, nt.02.nhr...) and also the same files with all the others extensions. But I have found I can use them all at the same time by passing it to command line between "". So now I have my_blast_db = (r'\"e:/nt.00 e:/nt.01 e:/nt.02 e:/nt.03 e:/nt.04 e:/nt.05 \"'). But now I'm mailing you with another doubt it is possible to pass the result_handle to blast_results line by line or something like that because I'm having a memory error in the step described below result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file) blast_results = result_handle.read() #Catch the results Maybe if I pass one line at a time and write ir immediately to the xml file it will work. Thanks once more, Bruno Santos From biopython at maubp.freeserve.co.uk Fri Oct 26 10:37:45 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Oct 2007 15:37:45 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <000701c817d5$d0e8f4e0$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> <4721B8E2.2040902@maubp.freeserve.co.uk> <000701c817d5$d0e8f4e0$2300a8c0@bsantos> Message-ID: <4721FBB9.1040408@maubp.freeserve.co.uk> > But now I'm mailing you with another doubt it is possible to pass the > result_handle to blast_results line by line or something like that because > I'm having a memory error in the step described below > > result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, > "blastn",my_blast_db, my_blast_file) > blast_results = result_handle.read() #Catch the results > > Maybe if I pass one line at a time and write it immediately to the xml file > it will work. XML files are big. Lots of query sequences will also make things bigger. And the default expectation threshold will also give lots of results - setting this to something harsher will help by giving less matches. Unless you want to keep the XML file for other analysis, it might be simpler to parse the output from blast directly with Biopython - avoiding having the large XML file on disk. Keeping the XML intermediate file can be a good idea when working on smaller datasets, where you want to tweak your analysis (without re-running blast each time). Peter From bsantos at biocant.pt Fri Oct 26 11:50:48 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Fri, 26 Oct 2007 16:50:48 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <4721FBB9.1040408@maubp.freeserve.co.uk> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> <4721B8E2.2040902@maubp.freeserve.co.uk> <000701c817d5$d0e8f4e0$2300a8c0@bsantos> <4721FBB9.1040408@maubp.freeserve.co.uk> Message-ID: <000801c817e7$fd1bc940$2300a8c0@bsantos> Peter Said: >XML files are big. Lots of query sequences will also make things >bigger. And the default expectation threshold will also give lots of >results - setting this to something harsher will help by giving less >matches. > >Unless you want to keep the XML file for other analysis, it might be >simpler to parse the output from blast directly with Biopython - >avoiding having the large XML file on disk. > >Keeping the XML intermediate file can be a good idea when working on >smaller datasets, where you want to tweak your analysis (without >re-running blast each time). But if even I don't want to save the results to an XML I still have to do the step right? And my problem is in this step not in writing to the file. Or I can use the result_handle directly, because I was reading the biopython documentation but it's not very clear. From biopython at maubp.freeserve.co.uk Fri Oct 26 12:04:40 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Oct 2007 17:04:40 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <000801c817e7$fd1bc940$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> <4721B8E2.2040902@maubp.freeserve.co.uk> <000701c817d5$d0e8f4e0$2300a8c0@bsantos> <4721FBB9.1040408@maubp.freeserve.co.uk> <000801c817e7$fd1bc940$2300a8c0@bsantos> Message-ID: <47221018.9090104@maubp.freeserve.co.uk> Bruno Santos wrote: > Peter Said: >> Unless you want to keep the XML file for other analysis, it might be >> simpler to parse the output from blast directly with Biopython - >> avoiding having the large XML file on disk. > > But if even I don't want to save the results to an XML I still have to do > the step right? > And my problem is in this step not in writing to the file. > Or I can use the result_handle directly, because I was reading the biopython > documentation but it's not very clear. The intention is something like this: result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file) blast_records = NCBIXML.parse(result_handle) for record in blast_records : #do stuff The bit about saving the results to a file and loading that to give a new handle is optional, but very handy if you need to look at the raw file by hand. Perhaps that section of the tutorial could be a little clearer ... Peter From mdehoon at c2b2.columbia.edu Sun Oct 28 02:32:40 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 02:32:40 -0400 Subject: [BioPython] Biopython release 1.44 ready Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Hi everybody, Biopython release 1.44 is now available for download from the Biopython website at http://biopython.org. This release includes lots of code improvements and fixes in the Blast interface and parsers, sequence input/output, the SwissProt parser, the clustering routines, as well as a brand new module for population genetics. For reasons of compatibility, some radical changes were necessary in some parts of the code; please let us know if you find some functionality missing. My thanks to all code contributers who made this new release possible. --Michiel on behalf of the Biopython developers Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From tiagoantao at gmail.com Sun Oct 28 17:31:58 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Sun, 28 Oct 2007 21:31:58 +0000 Subject: [BioPython] Biopython citation Message-ID: <4724FFCE.20103@gmail.com> Hello, I am submitting a paper regarding a Jython selection detection program that we have done, and I would like to cite biopython. What is really the best, most recent, citation? Tiago -- tiagoantao at gmail.com http://tiago.org/ps From biopython at maubp.freeserve.co.uk Sun Oct 28 16:52:05 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Oct 2007 20:52:05 +0000 Subject: [BioPython] Biopython citation In-Reply-To: <4724FFCE.20103@gmail.com> References: <4724FFCE.20103@gmail.com> Message-ID: <4724F675.8030902@maubp.freeserve.co.uk> Tiago Antao wrote: > I am submitting a paper regarding a Jython selection detection program > that we have done, and I would like to cite biopython. What is really > the best, most recent, citation? > > Tiago For a general project reference, I think the most recent is Brad & Jeff's 2000 newsletter article: Chapman, B. and Chang, J. (2000) Biopython: python tools for computational biology. ACM SIG-BIO Newsletter, 20, 15-19. However, I confess I only cited the www.biopython.org website in my last paper. Peter P.S. There are specific papers for some modules, e.g. Bio.PDB and Bio.Cluster From skhadar at gmail.com Mon Oct 29 09:15:30 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Mon, 29 Oct 2007 18:45:30 +0530 Subject: [BioPython] Biopython citation In-Reply-To: <4724F675.8030902@maubp.freeserve.co.uk> References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk> Message-ID: Hi Peter, I am interested to look at it. We dont have access to ACM. If you have a copy of that paper. Thanks, Shameer On 10/29/07, Peter wrote: > > Tiago Antao wrote: > > I am submitting a paper regarding a Jython selection detection program > > that we have done, and I would like to cite biopython. What is really > > the best, most recent, citation? > > > > Tiago > > For a general project reference, I think the most recent is Brad & > Jeff's 2000 newsletter article: > > Chapman, B. and Chang, J. (2000) Biopython: python tools for > computational biology. ACM SIG-BIO Newsletter, 20, 15-19. > > However, I confess I only cited the www.biopython.org website in my last > paper. > > Peter > > P.S. There are specific papers for some modules, e.g. Bio.PDB and > Bio.Cluster > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From skhadar at gmail.com Mon Oct 29 10:11:41 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Mon, 29 Oct 2007 19:41:41 +0530 Subject: [BioPython] Biopython citation In-Reply-To: <4725E655.8080608@maubp.freeserve.co.uk> References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk> <4725E655.8080608@maubp.freeserve.co.uk> Message-ID: Hi , Thanks for that !!! -- Shameer On 10/29/07, Peter wrote: > > Shameer Khadar wrote: > > Hi Peter, > > > > I am interested to look at it. We dont have access to ACM. If you > > have a copy of that paper. > > > > Thanks, Shameer > > Its not actually very informative, especial as of the examples are now > rather dated. Anyway, I believe the new-letter article was the same as > the document available on our website: > > http://biopython.org/DIST/docs/acm/ACMbiopy.html > http://biopython.org/DIST/docs/acm/ACMbiopy.pdf > > Chapman, B. and Chang, J. (2000) Biopython: python tools for > computational biology. ACM SIG-BIO Newsletter, 20, 15-19. > > Peter > From biopython at maubp.freeserve.co.uk Mon Oct 29 09:55:33 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Oct 2007 13:55:33 +0000 Subject: [BioPython] Biopython citation In-Reply-To: References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk> Message-ID: <4725E655.8080608@maubp.freeserve.co.uk> Shameer Khadar wrote: > Hi Peter, > > I am interested to look at it. We dont have access to ACM. If you > have a copy of that paper. > > Thanks, Shameer Its not actually very informative, especial as of the examples are now rather dated. Anyway, I believe the new-letter article was the same as the document available on our website: http://biopython.org/DIST/docs/acm/ACMbiopy.html http://biopython.org/DIST/docs/acm/ACMbiopy.pdf Chapman, B. and Chang, J. (2000) Biopython: python tools for computational biology. ACM SIG-BIO Newsletter, 20, 15-19. Peter From biopython at maubp.freeserve.co.uk Mon Oct 29 15:22:20 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Oct 2007 19:22:20 +0000 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: <4720D0E6.8000609@maubp.freeserve.co.uk> References: <471FB5DE.6080506@maubp.freeserve.co.uk> <4720D0E6.8000609@maubp.freeserve.co.uk> Message-ID: <320fb6e00710291222l1a5746e9m3bbc5c4c9fd03921@mail.gmail.com> Jonathan Boulais wrote: > But I still get the same error as before when I'm running it. > ... For anyone wanting to track this issue, Jonathan has filled Bug 2390 - Error importing Swiss Prot in BioSQL http://bugzilla.open-bio.org/show_bug.cgi?id=2390 Peter From anaryin at gmail.com Mon Oct 29 21:28:21 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Tue, 30 Oct 2007 01:28:21 +0000 Subject: [BioPython] Fwd: Scripts cannot connect In-Reply-To: References: <471C9C34.7000006@maubp.freeserve.co.uk> <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> Message-ID: I've checked all my connection settings, tested an awful lot of possibilities and I came to this conclusion. When using a webservice, I can't connect to the internet. In the same script, I can get for instance, the google page, but the lines regarding the webservice itself, they won't connect. I've tried to set environment proxy (through export http_proxy='blabla:yyyy') in the script itself and nothing. I've set os.environ[blabla] and it's doesn't work. So, does anyone has an idea of why this is happening? Shouldn't the webservice, if using http protocol (as it does), work just like any other command (let's say, urllib.urlopen)? I know this falls out of the BioPython theme but I consider it quite relevant for my BioPython work :) Thank you all in advance! From biopython at maubp.freeserve.co.uk Tue Oct 30 04:53:14 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Oct 2007 08:53:14 +0000 Subject: [BioPython] Fwd: Scripts cannot connect In-Reply-To: References: <471C9C34.7000006@maubp.freeserve.co.uk> <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> Message-ID: <4726F0FA.6000209@maubp.freeserve.co.uk> Jo?o Rodrigues wrote: > I've checked all my connection settings, tested an awful lot of > possibilities and I came to this conclusion. When using a webservice, I > can't connect to the internet. In the same script, I can get for instance, > the google page, but the lines regarding the webservice itself, they won't > connect. Are you still finding things work on Windows, but fail on Linux? If so, are you running the same version of python (and Biopython) on both? > I've tried to set environment proxy (through export > http_proxy='blabla:yyyy') in the script itself and nothing. I've set > os.environ[blabla] and it's doesn't work. When you say "it doesn't work", do you mean the (a) environment variable isn't set, or (b) the environment variable is set but has not effect. > So, does anyone has an idea of why this is happening? Shouldn't the > webservice, if using http protocol (as it does), work just like any other > command (let's say, urllib.urlopen)? Are you saying there is a difference depending on the URL type (plain page versus web-service?) Or, are you saying there is a difference depending on what python library you use (e.g. urllib or something else). > I know this falls out of the BioPython theme but I consider it quite > relevant for my BioPython work :) > > Thank you all in advance! This must be very frustrating for you. Have you been able to find your University's official documentation for the proxy? Peter From biopython at maubp.freeserve.co.uk Tue Oct 30 08:32:10 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Oct 2007 12:32:10 +0000 Subject: [BioPython] Question about Seq.count() In-Reply-To: <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com> <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com> Message-ID: <4727244A.4010705@maubp.freeserve.co.uk> Peter wrote: >> I've found the bug! >> >> The code for Bio.Seq.count is: >> >> def count(self, item): >> return len([x for x in self.data if x == item]) > > Yeah - by design this (and the functionally similar version for the > MutableSeq) both expect the count argument to be a single letter. The > simple fix for the Seq object is to use the string method internally: > > def count(self, item): > return self.data.count(item) > > For the MutableSeq things are not so straight forward, but supporting > multiple character arguments can be done. Bug 2386 and proposed patch here: http://bugzilla.open-bio.org/show_bug.cgi?id=2386 This also lets the count methods take Seq or MutableSeq objects as arguments - in addition to plain strings. Note there is room for improvement in my patch: For the case of the MutableSeq, we might want to investigate counting from the array of characters directly, rather than taking the lazy option of turning it into a string and counting that way. Peter From anaryin at gmail.com Tue Oct 30 12:29:00 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Tue, 30 Oct 2007 16:29:00 +0000 Subject: [BioPython] Fwd: Scripts cannot connect In-Reply-To: <4726F0FA.6000209@maubp.freeserve.co.uk> References: <471C9C34.7000006@maubp.freeserve.co.uk> <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> <4726F0FA.6000209@maubp.freeserve.co.uk> Message-ID: Are you still finding things work on Windows, but fail on Linux? If so, are you running the same version of python (and Biopython) on both? There is the same version in all operative systems. I'm using XP (one 32bits the other 64) in the Windows Machines (one at home another at "work") and Ubuntu 7.10 in both my laptop and the Workstation at the University (it's dual-booted). Regarding Biopython, it's the same version in all but my laptop that has the last upgrade of the 28th October (but still, it never worked before). But since I'm not using any modules, it should not have anything to do with it. When you say "it doesn't work", do you mean the (a) environment variable isn't set, or (b) the environment variable is set but has not effect. An example: I start a new session in my laptop and open the console. I type "export http_proxy='blabla'" to set the variable. I then type "env" and it returns me a list of all env variable *including* the http_proxy one. I run "aptitude update" and it works. If I do the same in a Python Script, it doesn't (at least when connecting to a webservice). I believe then, that the variable is set but it doesn't work somehow. Are you saying there is a difference depending on the URL type (plain page versus web-service?) I *think*, or suppose, that somehow, the two "types" of connection, despite using HTTP and the same proxy env. variable, are working differently. Or, are you saying there is a difference depending on what python library you use (e.g. urllib or something else). Which other libraries can I try out? Other than urllib? This must be very frustrating for you. Have you been able to find your University's official documentation for the proxy? It's a dilemma. On the one hand, I have a perfectly set windows system that can access the internet through the scripts I write. However, there is no ZSI for it (ot at least, I can't install it). As such, no SOAP support, no API I can get to work. On the other hand, GNU/Linux. It works perfectly, the *.deb packages exist and are quite easy to install, so I have ZSI and SOAP support to work with the API. However, I can't access the web with the ZSI module. I'll try to talk to the University Informatics Service to see if they can figure it out. Really hope they can, otherwise, I guess I'll just have to work from home since it works there.. :) Again, very thankful! Jo?o Rodrigues From ytu888 at hotmail.com Mon Oct 1 11:39:50 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 1 Oct 2007 06:39:50 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: Thanks Peter, However, I still haven't install mxText module in my Mac yet. Also could you tell me how to run the test file of ReportLab, when I launch Python and then import the test file into the python. Thanks. > Date: Fri, 28 Sep 2007 20:42:31 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com > CC: biopython at lists.open-bio.org > Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X > > Y Tu wrote: > > Thank you, Peter for the prompt answer. > > > > I did install the PIL already and tested with the commands "from PIL > > import Image", then "import _imaging". Both commands succeeded. > > That's why I don't understand why the test won't work. I used the > > command "python test_pdfgen_general.py" under the shell prompt, which > > generated the error. Since I installed PIL and succeeded in importing > > the module of PIL, I thought maybe I can solve the problem by running > > the test under Python. > > Looking in more detail at the original stack trace, > > > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load > > d = Image._getdecoder(self.mode, d, a, self.decoderconfig) > > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder > > raise IOError("decoder %s not available" % decoder_name) > > IOError: decoder jpeg not available > > Its possible that PIL needs some optional JPEG library, which ReportLab > wants to use. I suggest you search the ReportLab website & user's > mailing list, and if you can't work out what is wrong sign up to their > mailing list and ask them, http://www.reportlab.org/ > > Very little of Biopython needs ReportLab, you should be able to install > Biopython without it. > > Peter > > _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From ytu888 at hotmail.com Mon Oct 1 17:54:00 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 1 Oct 2007 12:54:00 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and installed it. Then I tried to install MySQL-python-1.2.2 but got the following error. How to create the mysql_config.path file? Thank you very much. leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ python setup.py build sh: line 1: mysql_config: command not found Traceback (most recent call last): File "setup.py", line 16, in metadata, options = get_config() File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 43, in get_config libs = mysql_config("libs_r") File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 24, in mysql_config raise EnvironmentError, "%s not found" % mysql_config.path EnvironmentError: mysql_config not found _________________________________________________________________ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx From lists.steve at arachnedesign.net Mon Oct 1 20:18:04 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Mon, 1 Oct 2007 16:18:04 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and > installed it. Then I tried to install MySQL-python-1.2.2 but got > the following error. How to create the mysql_config.path file? > Thank you very much. > > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ > python setup.py build > sh: line 1: mysql_config: command not found It seems as if you need to have the `mysql_config` command in your PATH variable and it's not there. Look for where mysql was installed (maybe /usr/local/mysql/...) and add its bin directory to your PATH environment variable. Or maybe it installed some binaries/symlinks into your /usr/local/bin directory? I think that'll do it for you. -steve From biopython at maubp.freeserve.co.uk Mon Oct 1 21:06:37 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 1 Oct 2007 22:06:37 +0100 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com> On 10/1/07, Y Tu wrote: > > Thanks Peter, > > However, I still haven't install mxText module in my Mac yet. I see you've signed up to the eGenix mailing list - I hope they can solve your mxTextTools installation problems. > Also could you tell me how to run the test file of ReportLab, when I > launch Python and then import the test file into the python. Thanks. In general I think most tests are designed to be run from the command line, not by running python, typing an import statement, and typing another command. You should check the ReportLab documentation to see what they recommend. To run a specific Biopython unit test, such as the general graphics unit test, you would do this: python run_tests.py test_GraphicsGeneral.py That would run the test, and check the output matched the expected results. Alternatively, you can do: python test_GraphicsGeneral.py I hope that helps. Peter From ULNJUJERYDIX at spammotel.com Tue Oct 2 06:52:53 2007 From: ULNJUJERYDIX at spammotel.com (Kevin Lam) Date: Tue, 2 Oct 2007 14:52:53 +0800 Subject: [BioPython] Fwd: **Fwd: [Bioperl-l] divide and blast blastunsplit blast subsequence In-Reply-To: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Message-ID: <5b6410e0710012352s520b537bj7374dd874dc93104@mail.gmail.com> Hi! I am trying to annotate a 200kb sequence by doing blastx to find the protein seq location I need to split the sequence up so that I get the best hits for each region (the top blast hits will mask the smaller proteins if i do it as a whole sequence) if i were to do it manually i can set the subsequence in the web gui for ncbi's blast. this way, the blast hits coords are based on the whole 200kb. but I can't find this option in blast or a straightforward way to do it in bioperl. I found similar solutions like http://www.bio.davidson.edu/projects/DAB/DAB.html divide and blast (but I want to specify coords rather than fixed intervals) there also this from the bioperl archives http://bioinformatics.org/pipermail/bioclusters/2002-August/000375.html but isn't there an easier way like i can specify blast subsequence 200-900 of fasta file and it will return the blastx hits in coords in terms of the whole 200kb? From mdehoon at c2b2.columbia.edu Tue Oct 2 09:06:54 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 05:06:54 -0400 Subject: [BioPython] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Hi everybody, Since no users of Bio.MultiProc came forward, I deprecated it for the upcoming release. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon Sent: Tue 9/11/2007 10:37 AM To: BioPython Developers List; biopython at biopython.org Subject: [BioPython] Bio.MultiProc Hi everybody, In preparation for the upcoming release, I was running the Biopython test suite and found that test_copen.py hangs on Cygwin. It doesn't fail, it just sits there forever. This may be related to the use of fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it is probably possible to fix this, I'd have to dig fairly deep into the code, and I am not sure if it is worth it. It looks like the copen functions are used only in Bio/config, which is needed for Bio.db. A description of the functionality of thia module can be found in the tutorial section 4.7.2. Now, I don't remember users asking about this module on the mailing list. From the tutorial documentation, it seems to be a nice piece of code, but I doubt that it is being used often in practice. So I was wondering: 1) Is anybody on this list using this code? 2) If not, can I mark it as deprecated for the upcoming release? Hopefully, people who are using this code will notice, and let us know that they need it. --Michiel. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From ytu888 at hotmail.com Tue Oct 2 11:36:58 2007 From: ytu888 at hotmail.com (Y Tu) Date: Tue, 2 Oct 2007 06:36:58 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com> Message-ID: Thank you very much, Peter. > Date: Mon, 1 Oct 2007 22:06:37 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com > Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X > CC: biopython at lists.open-bio.org > > On 10/1/07, Y Tu wrote: > > > > Thanks Peter, > > > > However, I still haven't install mxText module in my Mac yet. > > I see you've signed up to the eGenix mailing list - I hope they can > solve your mxTextTools installation problems. > > > Also could you tell me how to run the test file of ReportLab, when I > > launch Python and then import the test file into the python. Thanks. > > In general I think most tests are designed to be run from the command > line, not by running python, typing an import statement, and typing > another command. You should check the ReportLab documentation to see > what they recommend. > > To run a specific Biopython unit test, such as the general graphics > unit test, you would do this: > > python run_tests.py test_GraphicsGeneral.py > > That would run the test, and check the output matched the expected > results. Alternatively, you can do: > > python test_GraphicsGeneral.py > > I hope that helps. > > Peter _________________________________________________________________ Help yourself to FREE treats served up daily at the Messenger Caf?. Stop by today. http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline From ytu888 at hotmail.com Tue Oct 2 12:29:46 2007 From: ytu888 at hotmail.com (Y Tu) Date: Tue, 2 Oct 2007 07:29:46 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> Message-ID: Hi Steve, I checked the PATH and added /usr/local/mysql/bin into it. But I still got the same error message when running the setup.py. Thanks. > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 1 Oct 2007 16:18:04 -0400 > To: ytu888 at hotmail.com > > > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and > > installed it. Then I tried to install MySQL-python-1.2.2 but got > > the following error. How to create the mysql_config.path file? > > Thank you very much. > > > > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ > > python setup.py build > > sh: line 1: mysql_config: command not found > > It seems as if you need to have the `mysql_config` command in your > PATH variable and it's not there. > > Look for where mysql was installed (maybe /usr/local/mysql/...) and > add its bin directory to your PATH environment variable. Or maybe it > installed some binaries/symlinks into your /usr/local/bin directory? > > I think that'll do it for you. > > -steve > _________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us From idoerg at gmail.com Tue Oct 2 16:00:41 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Tue, 2 Oct 2007 09:00:41 -0700 Subject: [BioPython] [Biopython-dev] Bio.MultiProc In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> References: <46E6A845.3030601@c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From mdehoon at c2b2.columbia.edu Wed Oct 3 00:18:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 20:18:59 -0400 Subject: [BioPython] [Biopython-dev] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu><6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62D@mail2.exch.c2b2.columbia.edu> > Would it be possible to include the module, comment out the unworkable > source code and print a deprecation warning when it is imported? That is what I did. > 3) Leave an option of fixing and commenting the code back in (i.e. it is not > lost forever). Even after removing the code in some future release, the code will not be lost forever. It can always be retrieved from CVS and from older Biopython releases. > Also, is it possible to track down the original author? That would be Jeff Chang. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Iddo Friedberg [mailto:idoerg at gmail.com] Sent: Tue 10/2/2007 12:00 PM To: Michiel De Hoon Cc: BioPython Developers List; biopython at biopython.org Subject: Re: [Biopython-dev] [BioPython] Bio.MultiProc Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From ytu888 at hotmail.com Wed Oct 3 12:44:32 2007 From: ytu888 at hotmail.com (Y Tu) Date: Wed, 3 Oct 2007 07:44:32 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> Message-ID: Here is the copy of the output in the Terminal. Please help me to find out what's wrong. Thanks. Last login: Wed Oct 3 08:28:38 on ttyp4 Welcome to Darwin! LeesComputer:~ Lee$ echo $PATH /Library/Frameworks/Python.framework/Versions/Current/bin:/usr/local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin LeesComputer:~ Lee$ cd /applications/python_bio/MySQL-python-1.2.2 LeesComputer:/applications/python_bio/MySQL-python-1.2.2 Lee$ python setup.py build sh: line 1: mysql_config: command not found Traceback (most recent call last): File "setup.py", line 16, in metadata, options = get_config() File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 43, in get_config libs = mysql_config("libs_r") File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 24, in mysql_config raise EnvironmentError, "%s not found" % mysql_config.path EnvironmentError: mysql_config not found LeesComputer:/applications/python_bio/MySQL-python-1.2.2 Lee$ cd /usr/local LeesComputer:/usr/local Lee$ ls -al total 8 drwxr-xr-x 8 root wheel 272 Oct 1 13:02 . drwxr-xr-x 10 root wheel 340 Sep 26 11:30 .. drwxr-xr-x 8 root admin 272 Aug 6 04:00 ActivePerl-5.8 drwxr-xr-x 15 root wheel 510 Oct 2 03:52 bin drwxr-xr-x 6 root wheel 204 Sep 27 05:22 include drwxr-xr-x 12 root wheel 408 Sep 27 05:21 lib lrwxr-xr-x 1 root wheel 25 Oct 1 13:02 mysql -> mysql-5.0.45-osx10.4-i686 drwxr-xr-x 19 root wheel 646 Jul 4 13:54 mysql-5.0.45-osx10.4-i686 > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 1 Oct 2007 16:18:04 -0400 > To: ytu888 at hotmail.com > > > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and > > installed it. Then I tried to install MySQL-python-1.2.2 but got > > the following error. How to create the mysql_config.path file? > > Thank you very much. > > > > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ > > python setup.py build > > sh: line 1: mysql_config: command not found > > It seems as if you need to have the `mysql_config` command in your > PATH variable and it's not there. > > Look for where mysql was installed (maybe /usr/local/mysql/...) and > add its bin directory to your PATH environment variable. Or maybe it > installed some binaries/symlinks into your /usr/local/bin directory? > > I think that'll do it for you. > > -steve > _________________________________________________________________ Climb to the top of the charts!? Play Star Shuffle:? the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct From lists.steve at arachnedesign.net Wed Oct 3 13:01:09 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Wed, 3 Oct 2007 09:01:09 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> Message-ID: <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> Hi, On Oct 3, 2007, at 8:44 AM, Y Tu wrote: > Here is the copy of the output in the Terminal. Please help me to > find out what's wrong. Thanks. > > Last login: Wed Oct 3 08:28:38 on ttyp4 > Welcome to Darwin! > LeesComputer:~ Lee$ echo $PATH > /Library/Frameworks/Python.framework/Versions/Current/bin:/usr/ > local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin It still looks like your PATH is screwed up, /usr/local/mysql/bin isn't in there, you have: /usr/local/mysl:/bin Here's a test. Open up a terminal and type: $ which mysql_config If you don't get an answer back that indicates that the system can find the binary, then your script won't either. For instance, this is how it looks for me: $ which mysql_config /Library/MySQL/bin/mysql_config (I have an older version of mysql which was installed into /Library/ MySQL) Yours should say: $ which mysql_config /usr/local/mysql/bin/mysql_config Or something like that. Try that and see ... -steve From lists.steve at arachnedesign.net Wed Oct 3 14:47:41 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Wed, 3 Oct 2007 10:47:41 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> Message-ID: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> > Steve, thank you very much. It fixed the problem and I got through > the build and install step. But when I tested inside the python for > the installation I got following error. Please help me about it. > Thanks. > > >>> import MySQLdb > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ > site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > _mysql.py:3: UserWarning: Module _mysql was already imported from / > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, > but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to > sys.path > import sys, pkg_resources, imp > Traceback (most recent call last): > File "", line 1, in > File "MySQLdb/__init__.py", line 19, in > import _mysql > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in > > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in > __bootstrap__ > ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / > usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib > Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so > Reason: image not found Sorry, don't know exactly what's happening here. Is this from a "fresh" python prompt? How did you install MySQLdb, did you use easy_install? If so, try to install from the sourceforge download. Try to remove it, remove the "build" directory from your mysqldb download and redo the whole python setup.py build / python setup.py install process To remove it, nuke this: /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg And try to reinstall? Perhaps someone who knows what the problem is here can give you a better idea on what to do. -steve From sbassi at gmail.com Thu Oct 4 06:47:44 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 4 Oct 2007 03:47:44 -0300 Subject: [BioPython] Problem with blast xml Message-ID: I am having a problem that it is not originated in Biopython, but it is affecting the Biopython (1.43) xml blast parser. I have two xml files, one can be parsed and the other can't. Here are the commands I run to get the xml files: sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o TABB2v2.xml The relevant difference is the input file, the sequences are different, but the output file should have the same format (shouldn't it?). When I am parsing the files, I find that this is not true. This is the file that can be parsed without problem: >>> bout=open('bioinfo/INTA/TABB2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 31' >>> y.query u'fragment 67' >>> x.alignments [] >>> y.alignments [, , , , , , ] Let's see what seems to be a malformed? xml file: >>> bout=open('bioinfo/INTA/TABB2v2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 1' >>> y.query u'fragment 57' >>> x.alignments [] >>> y.alignments [] There is a record with an empty list. Here is a fragment of the "normal" one (TABB2.xml): 2 F 31 lcl|31_0 fragment 31 1174 1 gi|1788520|gb|AE000309.1|AE000309 Escherichia coli K-12 MG1655 section 199 of 400 of the complete genome AE000309 13453 1 Here is a fragment of the "malformed" one (TABB2v2.xml): 2 F 1 400 4662239 0 0 0.710603 1.37406 1.30725 57 Why is this happening? Is this a expected behavior? I uploaded the xml files here: http://www.bioinformatica.info/TABB2.xml http://www.bioinformatica.info/TABB2v2.xml -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From ytu888 at hotmail.com Thu Oct 4 12:24:18 2007 From: ytu888 at hotmail.com (Y Tu) Date: Thu, 4 Oct 2007 07:24:18 -0500 Subject: [BioPython] Error generated by Clustalw example in Tutorial Message-ID: Hi, I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks. >>> from Bio import Clustalw >>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta")) >>> cline.set_output("result.aln") >>> print cline clustalw .\opuntia.fasta -OUTFILE=result.aln >>> alignment = Clustalw.do_alignment(cline) Traceback (most recent call last): File "", line 1, in File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment % (out_file, command_line))IOError: Output .aln file result.aln not produced, commandline: clustalw .\opuntia.fasta -OUTFILE=result.aln _________________________________________________________________ Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033 From sbassi at gmail.com Thu Oct 4 16:19:22 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 4 Oct 2007 13:19:22 -0300 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: References: Message-ID: On 10/4/07, Y Tu wrote: > >>> print cline > clustalw .\opuntia.fasta -OUTFILE=result.aln I am not sure if this command is properly formated. The slash should not be there, but I don't have a windows box to try this. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Fri Oct 5 01:01:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 4 Oct 2007 21:01:59 -0400 Subject: [BioPython] Problem with blast xml References: Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> Can you create two minimal XML files that demonstrate the problem? For example, by removing records from the two files you have and checking if parsing still works for one and fails for the other. By doing so, you may be able to identify exactly what the essential difference between the two files is. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Sebastian Bassi Sent: Thu 10/4/2007 2:47 AM To: biopython at biopython.org Subject: [BioPython] Problem with blast xml I am having a problem that it is not originated in Biopython, but it is affecting the Biopython (1.43) xml blast parser. I have two xml files, one can be parsed and the other can't. Here are the commands I run to get the xml files: sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o TABB2v2.xml The relevant difference is the input file, the sequences are different, but the output file should have the same format (shouldn't it?). When I am parsing the files, I find that this is not true. This is the file that can be parsed without problem: >>> bout=open('bioinfo/INTA/TABB2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 31' >>> y.query u'fragment 67' >>> x.alignments [] >>> y.alignments [, , , , , , ] Let's see what seems to be a malformed? xml file: >>> bout=open('bioinfo/INTA/TABB2v2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 1' >>> y.query u'fragment 57' >>> x.alignments [] >>> y.alignments [] There is a record with an empty list. Here is a fragment of the "normal" one (TABB2.xml): 2 F 31 lcl|31_0 fragment 31 1174 1 gi|1788520|gb|AE000309.1|AE000309 Escherichia coli K-12 MG1655 section 199 of 400 of the complete genome AE000309 13453 1 Here is a fragment of the "malformed" one (TABB2v2.xml): 2 F 1 400 4662239 0 0 0.710603 1.37406 1.30725 57 Why is this happening? Is this a expected behavior? I uploaded the xml files here: http://www.bioinformatica.info/TABB2.xml http://www.bioinformatica.info/TABB2v2.xml -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From sbassi at gmail.com Fri Oct 5 05:39:44 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Fri, 5 Oct 2007 02:39:44 -0300 Subject: [BioPython] Problem with blast xml In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> Message-ID: On 10/4/07, Michiel De Hoon wrote: > Can you create two minimal XML files that demonstrate the problem? > For example, by removing records from the two files you have and checking if > parsing still works for one and fails for the other. > By doing so, you may be able to identify exactly what the essential > difference between the two files is. After some tests, I found two minimal XML files with this issue: http://www.bioinformatica.info/mitoA.xml http://www.bioinformatica.info/mitoB.xml (only 3.5 kb each). -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Fri Oct 5 06:34:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 5 Oct 2007 02:34:56 -0400 Subject: [BioPython] Problem with blast xml References: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B631@mail2.exch.c2b2.columbia.edu> >From looking at the XML files, it seems that the Biopython Blast XML parser is doing the right thing. Isn't it? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Sebastian Bassi [mailto:sbassi at gmail.com] Sent: Fri 10/5/2007 1:39 AM To: Michiel De Hoon Cc: biopython at biopython.org Subject: Re: [BioPython] Problem with blast xml On 10/4/07, Michiel De Hoon wrote: > Can you create two minimal XML files that demonstrate the problem? > For example, by removing records from the two files you have and checking if > parsing still works for one and fails for the other. > By doing so, you may be able to identify exactly what the essential > difference between the two files is. After some tests, I found two minimal XML files with this issue: http://www.bioinformatica.info/mitoA.xml http://www.bioinformatica.info/mitoB.xml (only 3.5 kb each). -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Fri Oct 5 09:26:06 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 05 Oct 2007 10:26:06 +0100 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: References: Message-ID: <4706032E.1020703@maubp.freeserve.co.uk> Y Tu wrote: > Hi, > > I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks. > >>>> from Bio import Clustalw >>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta")) >>>> cline.set_output("result.aln") >>>> print cline > clustalw .\opuntia.fasta -OUTFILE=result.aln The Windows version of ClustalW is very fussy. To experiment try running this by hand at the windows command prompt - note that I'm not at my Windows machine so I haven't double checked this: clustalw .\opuntia.fasta -OUTFILE=result.aln or, clustalw opuntia.fasta -OUTFILE=result.aln Any error messages would be helpful. I suggest you try this in Biopython: from Bio import Clustalw cline = Clustalw.MultipleAlignCL("opuntia.fasta") cline.set_output("result.aln") print cline Also, we have made a few tweaks to this code since Biopython 1.43 was released (see emails with Emanuel Hey in July 2007). If you like, you can try updating this module to the CVS version. Simply backup the existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and replace it with the latest code from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python Peter From ytu888 at hotmail.com Fri Oct 5 16:32:05 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 5 Oct 2007 11:32:05 -0500 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: <4706032E.1020703@maubp.freeserve.co.uk> References: <4706032E.1020703@maubp.freeserve.co.uk> Message-ID: I tested both commands under window prompt, initially both generated error because window don't know clustalw. Once I give the correct path of the clustalw, both generated alignment results without any error. BTW, I used the one inside BioEdit, I did not find clustalw coming with Biopython. It looks like python use online program at ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right? Then I replace the old _ini_with the new one, but there is a new error message similar to the old one: >>> alignment = Clustalw.do_alignment(cline) Traceback (most recent call last): File "", line 1, in File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment # check if the outfile exists before parsing IOError: Output .aln file result1.aln not produced, commandline: clustalw opuntia.fasta -OUTFILE=result1.aln Also I tested the example on OS X, the same error was generated: >>> alignment = Clustalw.do_alignment(cline) sh: line 1: clustalw: command not found Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 117, in do_alignment % (out_file, command_line)) IOError: Output .aln file result1.aln not produced, commandline: clustalw ./opuntia.fasta -OUTFILE=result1.aln It seems like the problem is not linked to OS. What other things could be wrong? Thanks. > Date: Fri, 5 Oct 2007 10:26:06 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com > CC: biopython at lists.open-bio.org > Subject: Re: [BioPython] Error generated by Clustalw example in Tutorial > > Y Tu wrote: > > Hi, > > > > I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks. > > > >>>> from Bio import Clustalw > >>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta")) > >>>> cline.set_output("result.aln") > >>>> print cline > > clustalw .\opuntia.fasta -OUTFILE=result.aln > > The Windows version of ClustalW is very fussy. To experiment try > running this by hand at the windows command prompt - note that I'm not > at my Windows machine so I haven't double checked this: > > clustalw .\opuntia.fasta -OUTFILE=result.aln > > or, > > clustalw opuntia.fasta -OUTFILE=result.aln > > Any error messages would be helpful. > > I suggest you try this in Biopython: > > from Bio import Clustalw > cline = Clustalw.MultipleAlignCL("opuntia.fasta") > cline.set_output("result.aln") > print cline > > Also, we have made a few tweaks to this code since Biopython 1.43 was > released (see emails with Emanuel Hey in July 2007). If you like, you > can try updating this module to the CVS version. Simply backup the > existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and > replace it with the latest code from here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python > > Peter > _________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us From biopython at maubp.freeserve.co.uk Fri Oct 5 18:35:05 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 05 Oct 2007 19:35:05 +0100 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: References: <4706032E.1020703@maubp.freeserve.co.uk> Message-ID: <470683D9.90808@maubp.freeserve.co.uk> Y Tu wrote: > I tested both commands under window prompt, initially both generated > error because window don't know clustalw. This is expected. You must either supply the full path of the clustalw executable, or have it on the system path. Otherwise Windows doesn't know how to find the clustalw program. > Once I give the correct path of the clustalw, both generated > alignment results without any error. BTW, I used the one inside > BioEdit, I did not find clustalw coming with Biopython. It looks like > python use online program at > ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right? Clustalw is a standalone program (completely separate from Biopython) which you must install separately if you want to use it. It is available from several servers - the one you chose looks fine. > Then I replace the old _ini_with the new one, but there is a new > error message similar to the old one: > >>>> alignment = Clustalw.do_alignment(cline) > Traceback (most recent call last): File "", line > 1, in File > "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, > in do_alignment # check if the outfile exists before parsing IOError: > Output .aln file result1.aln not produced, commandline: clustalw > opuntia.fasta -OUTFILE=result1.aln > > Also I tested the example on OS X, the same error was generated: > >>>> alignment = Clustalw.do_alignment(cline) > sh: line 1: clustalw: command not found Traceback (most recent call > last): File "", line 1, in File > "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", > line 117, in do_alignment % (out_file, command_line)) IOError: > Output .aln file result1.aln not produced, commandline: clustalw > ./opuntia.fasta -OUTFILE=result1.aln > > It seems like the problem is not linked to OS. What other things > could be wrong? Thanks. In both cases, you are not explicitly providing the path to clustalw - so for this to work the clustalw executable must be on the system path. The other obvious thing to check is the location of the files versus the working directory. Is your python script in the same folder as the opuntia.fasta file? What happens if you try those exact command lines (which Biopython says it is trying to run) at the command prompt in directory where your python script is located? i.e. Windows: clustalw opuntia.fasta -OUTFILE=result1.aln Mac: clustalw ./opuntia.fasta -OUTFILE=result1.aln Peter From meesters at uni-mainz.de Mon Oct 8 15:07:54 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 8 Oct 2007 17:07:54 +0200 Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures? Message-ID: <1191856074.5425.24.camel@cmeesters> Hi, I'm trying to 'split' a structure in several pieces, e.g. a former chain 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on. Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ... Perhaps some code explains better what I'm trying to achieve: breakpoints = [1254, 5444, 6690, 10888, 10889, 16332, 16333, 21776, 21776, 27220, 27221, 32665] def split_chain(structure, breakpoints, outname = 'split.pdb'): chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] chain = chains.pop(0) for atom in structure.get_atoms(): number = atom.get_serial_number() if breaks and number == breaks[0]: breaks.pop(0) chain = chains.pop(0) atom.parent.parent.id = chain # assign new chain iostream = PDBIO() try: outfile = open(outname, 'w') iostream.set_structure(structure.structure) iostream.save(outfile) except IOError, msg: raise IOError(msg) So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to 5444. Instead the written pdb-file contains all atoms, but with the wrong chain ids (see above). (Please don't tell my how unpythonic the code reads, point is that I've tried so many different things that I first need to understand my logic mistake.) Any ideas, where my mistake is? Thanks, Christian From meesters at uni-mainz.de Mon Oct 8 15:54:32 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 8 Oct 2007 17:54:32 +0200 Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures? In-Reply-To: <470A508C.4060803@maubp.freeserve.co.uk> References: <1191856074.5425.24.camel@cmeesters> <470A508C.4060803@maubp.freeserve.co.uk> Message-ID: <1191858872.5425.32.camel@cmeesters> > > breakpoints = [1254, 5444, > > 6690, 10888, > > 10889, 16332, > > 16333, 21776, > > 21776, 27220, > > 27221, 32665] > > I'm assuming this is "breaks" later on. Absolutely - that's the pain with copy & paste for demos ... sorry. > As the reason, I think this is what is happening: Given an atom, then > atom.parent will be a residue object, and atom.parent.parent will be a > chain object. Note all the atoms in a single amino acid residue will > share share the same .parent, and all the atoms in a single chain will > share the same .parent.parent > > i.e. You have renamed Chain "A" to "A", and then later renamed this > chain to "B", and then again to "C". You didn't ever split up the chain > into sub chains. Mh, makes sense. > > To be honest, I would be tempted to write a quick and dirty script which > parsed the raw PDB file, and rewrote the chain field based on the atom > sequence number - without the overhead of the PDB parser. Yes, would have been too easy ;-). Only wanted to add this functionality to a larger application and make it easy to use. There is no strict need to do so, but it would have been nice. However, thanks for the input. Christian From biopython at maubp.freeserve.co.uk Mon Oct 8 15:45:16 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 08 Oct 2007 16:45:16 +0100 Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures? In-Reply-To: <1191856074.5425.24.camel@cmeesters> References: <1191856074.5425.24.camel@cmeesters> Message-ID: <470A508C.4060803@maubp.freeserve.co.uk> Christian Meesters wrote: > Hi, > > I'm trying to 'split' a structure in several pieces, e.g. a former chain > 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on. > Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ... > > Perhaps some code explains better what I'm trying to achieve: > > breakpoints = [1254, 5444, > 6690, 10888, > 10889, 16332, > 16333, 21776, > 21776, 27220, > 27221, 32665] I'm assuming this is "breaks" later on. > def split_chain(structure, breakpoints, outname = 'split.pdb'): > chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', > 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', > 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', > 'X', 'Y', 'Z'] > > chain = chains.pop(0) > for atom in structure.get_atoms(): > number = atom.get_serial_number() > if breaks and number == breaks[0]: > breaks.pop(0) > chain = chains.pop(0) > atom.parent.parent.id = chain # assign new chain > > iostream = PDBIO() > try: > outfile = open(outname, 'w') > iostream.set_structure(structure.structure) > iostream.save(outfile) > except IOError, msg: > raise IOError(msg) > > So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to > 5444. Instead the written pdb-file contains all atoms, but with the > wrong chain ids (see above). (Please don't tell my how unpythonic the > code reads, point is that I've tried so many different things that I > first need to understand my logic mistake.) > > Any ideas, where my mistake is? As the reason, I think this is what is happening: Given an atom, then atom.parent will be a residue object, and atom.parent.parent will be a chain object. Note all the atoms in a single amino acid residue will share share the same .parent, and all the atoms in a single chain will share the same .parent.parent i.e. You have renamed Chain "A" to "A", and then later renamed this chain to "B", and then again to "C". You didn't ever split up the chain into sub chains. I think you need to create a new chain objects instead... but I'm not sure off hand how best to do this with Bio.PDB To be honest, I would be tempted to write a quick and dirty script which parsed the raw PDB file, and rewrote the chain field based on the atom sequence number - without the overhead of the PDB parser. Peter From bbrazelton at gmail.com Tue Oct 9 00:33:03 2007 From: bbrazelton at gmail.com (B. Brazelton) Date: Mon, 8 Oct 2007 17:33:03 -0700 Subject: [BioPython] BLAST XML parser trouble Message-ID: I tried to follow the BLAST XML parser example in the tutorial, but I always get the following error when attempting to iterate through the records: Traceback (most recent call last): File "BlastXML_Parser.py", line 10, in ? for blast_record in blast_records: File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py", line 572, in parse expat_parser.Parse(text, False) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py", line 98, in endElement eval("self.%s()" % method) File "", line 0, in ? File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py", line 215, in _end_BlastOutput_version self._header.version = self._value.split()[1] IndexError: list index out of range All I did was: result_handle = open('NifH_Blast.xml') from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: ... etc I put my script and xml file here: http://www.staff.washington.edu/braz/files I'm using biopython 1.43, and I get the same error on both Python 2.3.5 and Python 5. It seems like my commands are exactly what is in the tutorial, so I'm confused. My best guess is that there is a difference in the XML format, but it's NCBI XML. Thanks for any help, Bill Brazelton From sbassi at gmail.com Tue Oct 9 00:48:50 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 8 Oct 2007 21:48:50 -0300 Subject: [BioPython] BLAST XML parser trouble In-Reply-To: References: Message-ID: On 10/8/07, B. Brazelton wrote: > I tried to follow the BLAST XML parser example in the tutorial, but I > always get the following error when attempting to iterate through the > records: Got the same result as you. Could you please tell me the URL of the tutorial you saw this? -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Tue Oct 9 02:55:21 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon, 8 Oct 2007 22:55:21 -0400 Subject: [BioPython] BLAST XML parser trouble References: Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu> How did you produce the XML file? In particular, which Blast version did you use? The Blast XML parser trips over the following line in your XML file: unspecified This is supposed to be: BLASTP 2.2.12 [Aug-07-2005] , of course depending on which Blast version you are using. --Michiel Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton Sent: Mon 10/8/2007 8:33 PM To: biopython at biopython.org Subject: [BioPython] BLAST XML parser trouble I tried to follow the BLAST XML parser example in the tutorial, but I always get the following error when attempting to iterate through the records: Traceback (most recent call last): File "BlastXML_Parser.py", line 10, in ? for blast_record in blast_records: File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/Bio/Blast/NCBIXML.py", line 572, in parse expat_parser.Parse(text, False) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/Bio/Blast/NCBIXML.py", line 98, in endElement eval("self.%s()" % method) File "", line 0, in ? File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/Bio/Blast/NCBIXML.py", line 215, in _end_BlastOutput_version self._header.version = self._value.split()[1] IndexError: list index out of range All I did was: result_handle = open('NifH_Blast.xml') from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: ... etc I put my script and xml file here: http://www.staff.washington.edu/braz/files I'm using biopython 1.43, and I get the same error on both Python 2.3.5 and Python 5. It seems like my commands are exactly what is in the tutorial, so I'm confused. My best guess is that there is a difference in the XML format, but it's NCBI XML. Thanks for any help, Bill Brazelton _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From kbaa at novonordisk.com Tue Oct 9 12:26:14 2007 From: kbaa at novonordisk.com (KBAA (Kent Bondensgaard)) Date: Tue, 9 Oct 2007 14:26:14 +0200 Subject: [BioPython] FW: Parsing sequence information in patents Message-ID: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net> Does anyone know how to parse protein sequence information in patents with Biopython? BR, Kent Bondensgaards __________________________________ Kent Bondensgaard Research Scientist Protein Structure and Biophysics Novo Nordisk A/S Novo Nordisk Park DK-2760 M?l?v Denmark +45 4443 4510 (direct) +45 3075 4510 (mobile) +45 4466 3450 (fax) kbaa at novonordisk.com Changing the way we look at diabetes A new DAWN for people with diabetes? Click here to read more This e-mail (including any attachments) is intended for the addressee(s) stated above only and may contain confidential information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information contained herein is strictly prohibited and may violate rights to proprietary information. If you are not an intended recipient, please return this e-mail to the sender and delete it immediately hereafter. Thank you. From sbassi at gmail.com Tue Oct 9 13:04:51 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 9 Oct 2007 10:04:51 -0300 Subject: [BioPython] FW: Parsing sequence information in patents In-Reply-To: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net> References: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net> Message-ID: On 10/9/07, KBAA (Kent Bondensgaard) wrote: > > Does anyone know how to parse protein sequence information in patents with Biopython? What about using patAA and patNT from NCBI? They are both available as blast ready, you could retrieve the fasta file using fastacmd. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From bbrazelton at gmail.com Tue Oct 9 20:24:58 2007 From: bbrazelton at gmail.com (B. Brazelton) Date: Tue, 9 Oct 2007 13:24:58 -0700 Subject: [BioPython] BLAST XML parser trouble In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu> Message-ID: I put in 'tblastx 2.2.15 [Oct-15-2006]' and it worked fine. Thanks for your help, sorry for the newbie question. (FYI, I was using results generated from the CAMERA database (http://camera.calit2.net/), and I was using the main biopython tutorial and cookbook from biopython.org. thanks again, BB On 10/8/07, Michiel De Hoon wrote: > How did you produce the XML file? In particular, which Blast version did you > use? > The Blast XML parser trips over the following line in your XML file: > > unspecified > > This is supposed to be: > > BLASTP 2.2.12 [Aug-07-2005] > > , of course depending on which Blast version you are using. > > --Michiel > > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton > Sent: Mon 10/8/2007 8:33 PM > To: biopython at biopython.org > Subject: [BioPython] BLAST XML parser trouble > > I tried to follow the BLAST XML parser example in the tutorial, but I > always get the following error when attempting to iterate through the > records: > > Traceback (most recent call last): > File "BlastXML_Parser.py", line 10, in ? > for blast_record in blast_records: > File > "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- > packages/Bio/Blast/NCBIXML.py", > line 572, in parse > expat_parser.Parse(text, False) > File > "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- > packages/Bio/Blast/NCBIXML.py", > line 98, in endElement > eval("self.%s()" % method) > File "", line 0, in ? > File > "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- > packages/Bio/Blast/NCBIXML.py", > line 215, in _end_BlastOutput_version > self._header.version = self._value.split()[1] > IndexError: list index out of range > > All I did was: > > result_handle = open('NifH_Blast.xml') > from Bio.Blast import NCBIXML > blast_records = NCBIXML.parse(result_handle) > for blast_record in blast_records: > ... etc > > I put my script and xml file here: > http://www.staff.washington.edu/braz/files > > I'm using biopython 1.43, and I get the same error on both Python > 2.3.5 and Python 5. > > It seems like my commands are exactly what is in the tutorial, so I'm > confused. My best guess is that there is a difference in the XML > format, but it's NCBI XML. Thanks for any help, > > Bill Brazelton > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From sbassi at gmail.com Tue Oct 9 21:09:09 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 9 Oct 2007 18:09:09 -0300 Subject: [BioPython] Getting Qv using Python? Message-ID: Is there an automated way to get Quality Values (QV) from a ab1 file? I wrap Abiview [1] to get the sequence, but now I need the Qv. [1] http://bioweb.pasteur.fr/docs/EMBOSS/abiview.html -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From prashanth at ibioinformatics.org Wed Oct 10 12:17:26 2007 From: prashanth at ibioinformatics.org (Prashantha Hebbar Kiradi) Date: Wed, 10 Oct 2007 17:47:26 +0530 Subject: [BioPython] where is SeqIO.parse()? Message-ID: <470CC2D6.1090504@ibioinformatics.org> Hi everybody, While trying the example of 'Parsing sequence file formats' from section 2.4 of Biopython tutorial: ------------------------------------------------- from Bio import SeqIO handle = open("ls_orchid.fasta") for seq_record in SeqIO.parse(handle, "fasta") : print seq_record.id print seq_record.seq print len(seq_record.seq) handle.close() ------------------------------------------------- I get this error: ------------------------------------------------- Traceback (most recent call last): File "fastEx.py", line 5, in for seq_record in SeqIO.parse(handle, "fasta") : AttributeError: 'module' object has no attribute 'parse' ------------------------------------------------- Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm using is opening correctly. API documentation reports that the 'parse' function is there. What am I doing wrong? I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1. Thanks in advance, Prashantha Hebbar Institute of Bioinformatics ITPL, Bangalore, INDIA From fennan at gmail.com Wed Oct 10 12:20:56 2007 From: fennan at gmail.com (Fernando) Date: Wed, 10 Oct 2007 14:20:56 +0200 Subject: [BioPython] Code publications Message-ID: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> Hi everybody, This might be off-topic, or maybe not: I've been working with biopython for a while and I am curious about what the authors get from all the exceptional work they are doing... I know it won't have to do anything with money, but in terms of publication / copyrihts etc, what are the adventages of having your code in biopython? Is there a journey / conference where the author publish their works and likewise they can be referenced or something like that? Thanks, Fernando From mdehoon at c2b2.columbia.edu Wed Oct 10 12:24:33 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 10 Oct 2007 08:24:33 -0400 Subject: [BioPython] where is SeqIO.parse()? References: <470CC2D6.1090504@ibioinformatics.org> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B635@mail2.exch.c2b2.columbia.edu> > I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1. Use Biopython 1.43. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Prashantha Hebbar Kiradi Sent: Wed 10/10/2007 8:17 AM To: biopython at biopython.org Subject: [BioPython] where is SeqIO.parse()? Hi everybody, While trying the example of 'Parsing sequence file formats' from section 2.4 of Biopython tutorial: ------------------------------------------------- from Bio import SeqIO handle = open("ls_orchid.fasta") for seq_record in SeqIO.parse(handle, "fasta") : print seq_record.id print seq_record.seq print len(seq_record.seq) handle.close() ------------------------------------------------- I get this error: ------------------------------------------------- Traceback (most recent call last): File "fastEx.py", line 5, in for seq_record in SeqIO.parse(handle, "fasta") : AttributeError: 'module' object has no attribute 'parse' ------------------------------------------------- Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm using is opening correctly. API documentation reports that the 'parse' function is there. What am I doing wrong? I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1. Thanks in advance, Prashantha Hebbar Institute of Bioinformatics ITPL, Bangalore, INDIA _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From cjfields at uiuc.edu Wed Oct 10 14:14:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 Oct 2007 09:14:48 -0500 Subject: [BioPython] Code publications In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> Message-ID: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu> This is a question that could be posed for any open-source project. It differs per person in my opinion. For instance, I donate time and code to BioPerl based on several factors. Not reinventing the wheel, giving back to the community, access to the code base, and the joy of programming (believe it or not) are among them, but they aren't the only ones. Publications don't hurt but they aren't my primary motivation. It generally isn't the focus of my research, only a means to an end (to parse or generate data). I don't see anything wrong with it being someone else's primary drive to donate as long as they continue support their code post-publication, an issue that unfortunately pops up quite frequently. chris On Oct 10, 2007, at 7:20 AM, Fernando wrote: > Hi everybody, > > This might be off-topic, or maybe not: > > I've been working with biopython for a while and I am curious about > what the > authors get from all the exceptional work they are doing... I know > it won't > have to do anything with money, but in terms of publication / > copyrihts etc, > what are the adventages of having your code in biopython? Is there > a journey > / conference where the author publish their works and likewise they > can be > referenced or something like that? > > Thanks, > Fernando > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From biopython at maubp.freeserve.co.uk Wed Oct 10 12:42:01 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Oct 2007 13:42:01 +0100 Subject: [BioPython] Code publications In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> Message-ID: <470CC899.6080802@maubp.freeserve.co.uk> Fernando wrote: > Hi everybody, > > This might be off-topic, or maybe not: > > I've been working with biopython for a while and I am curious about what the > authors get from all the exceptional work they are doing... I know it won't > have to do anything with money, but in terms of publication / copyrihts etc, > what are the adventages of having your code in biopython? Is there a journey > / conference where the author publish their works and likewise they can be > referenced or something like that? Pride? Looks good on a CV? Although I must say working on BioPerl would have been a better choice from the point of view of job hunting ;) Some of the specific modules have associated publications which get cited (e.g. Bio.PDB and Bio.Cluster - although the later is also available independently of Biopython). The closest to a general Biopython paper is currently Chapman and Chang 2000. In terms of talks, most recently I gave a talk at BOSC 2007 in July, the "Biopython Project Update". Which reminds me, I have a few photos and the slides (sadly in PowerPoint - my initial attempt to convert them into PDF wasn't great, font issues leading to content getting cropped). Peter From tiagoantao at gmail.com Wed Oct 10 16:59:56 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Wed, 10 Oct 2007 17:59:56 +0100 Subject: [BioPython] Code publications In-Reply-To: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu> References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu> Message-ID: <470D050C.7060500@gmail.com> I am currently submitting my populations genetics' code into biopython and I can talk about my motivations. Most of the code that I am submitting was used in something that I have done in the past (sometimes published). I figured, that if I have the code sitting here, I could as well donate it. This has one interesting advantage for me: all the code that I know I will try to submit to biopython is designed with care, all the code that is a one off is really a big mess. For me making code public is a motivator to maintain clean code. It is also a way to get to know people that are interested in this type of problems, and I think that, as with all things in life, knowing more people is a good thing. Maybe, in 12/18 months time I might think in suggesting to other people writing an article on the popgen work in biopython. Lets face it, that is also a good motivator. But, if it is the only one, I would agree that is not good (as Chris says, maintenance after publication...) Last, but not least: ethical and moral issues. Having spent some time outside of science I do think most scientific work is done in a very closed fashion (it was a shock to me, really). From my personal point of view open science and free software are arguments to which I connect moral value. Tiago Chris Fields wrote: > This is a question that could be posed for any open-source project. > > It differs per person in my opinion. For instance, I donate time and > code to BioPerl based on several factors. Not reinventing the wheel, > giving back to the community, access to the code base, and the joy of > programming (believe it or not) are among them, but they aren't the > only ones. > > Publications don't hurt but they aren't my primary motivation. It > generally isn't the focus of my research, only a means to an end (to > parse or generate data). I don't see anything wrong with it being > someone else's primary drive to donate as long as they continue > support their code post-publication, an issue that unfortunately pops > up quite frequently. > > chris > > On Oct 10, 2007, at 7:20 AM, Fernando wrote: > > >> Hi everybody, >> >> This might be off-topic, or maybe not: >> >> I've been working with biopython for a while and I am curious about >> what the >> authors get from all the exceptional work they are doing... I know >> it won't >> have to do anything with money, but in terms of publication / >> copyrihts etc, >> what are the adventages of having your code in biopython? Is there >> a journey >> / conference where the author publish their works and likewise they >> can be >> referenced or something like that? >> >> Thanks, >> Fernando >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From rebekah.rogers at gmail.com Thu Oct 11 18:57:21 2007 From: rebekah.rogers at gmail.com (Rebekah Rogers) Date: Thu, 11 Oct 2007 14:57:21 -0400 Subject: [BioPython] running PAML in python Message-ID: <79def59f0710111157h7483d5b5m6e6cdb3b86266750@mail.gmail.com> Hello: Does anyone know of an existing library that can run aligned sequences in PAML and then pull out the dN/dS values? Thanks! -Rebekah From The_Polymorph at rocketmail.com Sun Oct 14 17:04:48 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 14 Oct 2007 10:04:48 -0700 (PDT) Subject: [BioPython] Performing sequence alignments, etc. Message-ID: <311410.84366.qm@web50801.mail.re2.yahoo.com> Hi all. Hi all. I'm relatively new to the field of bioinformatics and I'm trying to perform a multiple sequence alignment on 5-6 sequences (fasta format - dna sequences). I'd like the output to be formatted in the following manner (clustalw standalone output): accession_number1: atctcgatatcgggcgctcta... accession_number2: atctctattctctggatctct... ... When one more more nucleotides columns are identical, clustalw displays an asterisk. If not, a blank space is displayed. Is this a standard feature of BioPython? Also, I'm evaluating several sequences but I'd like to obtain the most recent complete genomes possible from various countries. Is there a convenient source to use (GenBank?) if I don't know the accession numbers? Thanks, ~Caitlin Thanks, ~Caitlin ____________________________________________________________________________________ Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/ From biopython at maubp.freeserve.co.uk Sun Oct 14 17:38:32 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 14 Oct 2007 18:38:32 +0100 Subject: [BioPython] Performing sequence alignments, etc. In-Reply-To: <311410.84366.qm@web50801.mail.re2.yahoo.com> References: <311410.84366.qm@web50801.mail.re2.yahoo.com> Message-ID: <47125418.5020009@maubp.freeserve.co.uk> Caitlin wrote: > Hi all. > > I'm relatively new to the field of bioinformatics and I'm trying to > perform a multiple sequence alignment on 5-6 sequences (fasta format - > dna sequences). I'd like the output to be formatted in the following > manner (clustalw standalone output): For reading and writing Clustalw alignment files, you could either use Bio.SeqIO (format name "clustal") or the Bio.Clustalw module. http://biopython.org/wiki/SeqIO > When one more more nucleotides columns are identical, clustalw displays > an asterisk. If not, a blank space is displayed. Is this a standard > feature of BioPython? There is an example of Clustalw output online here - note there can also be a column of numbers on the right hand side (not shown here): http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format It sounds like you are describing the simple consensus string which clustalw outputs under the alignment (using *:. and space). Biopython has a SummaryInfo object which can calculate simple consensus sequences (see the tutorial). Perhaps this would be close to what you want to do. > Also, I'm evaluating several sequences but I'd like to obtain the most > recent complete genomes possible from various countries. Is there a > convenient source to use (GenBank?) if I don't know the accession > numbers? What sort of Genomes? Bacteria? Vertebrates? You could start by having a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these three are kept in sync with each other). Biopython has quite a nice interface for searching and downloading sequences from GenBank (again, see the tutorial) so that would be my first suggestion. Peter From The_Polymorph at rocketmail.com Mon Oct 15 02:13:24 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 14 Oct 2007 19:13:24 -0700 (PDT) Subject: [BioPython] Performing sequence alignments, etc. In-Reply-To: <47125418.5020009@maubp.freeserve.co.uk> Message-ID: <129586.66498.qm@web50807.mail.re2.yahoo.com> Thanks Peter. The genomes are viral. I'll definitely read that tutorial. Your help is very appreciated. ~Caitlin --- Peter wrote: > Caitlin wrote: > > Hi all. > > > > I'm relatively new to the field of bioinformatics and I'm trying to > > perform a multiple sequence alignment on 5-6 sequences (fasta > format - > > dna sequences). I'd like the output to be formatted in the > following > > manner (clustalw standalone output): > > For reading and writing Clustalw alignment files, you could either > use > Bio.SeqIO (format name "clustal") or the Bio.Clustalw module. > http://biopython.org/wiki/SeqIO > > > When one more more nucleotides columns are identical, clustalw > displays > > an asterisk. If not, a blank space is displayed. Is this a standard > > feature of BioPython? > > There is an example of Clustalw output online here - note there can > also > be a column of numbers on the right hand side (not shown here): > http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format > > It sounds like you are describing the simple consensus string which > clustalw outputs under the alignment (using *:. and space). > > Biopython has a SummaryInfo object which can calculate simple > consensus > sequences (see the tutorial). Perhaps this would be close to what you > > want to do. > > > Also, I'm evaluating several sequences but I'd like to obtain the > most > > recent complete genomes possible from various countries. Is there a > > convenient source to use (GenBank?) if I don't know the accession > > numbers? > > What sort of Genomes? Bacteria? Vertebrates? You could start by > having > a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these > three are kept in sync with each other). > > Biopython has quite a nice interface for searching and downloading > sequences from GenBank (again, see the tutorial) so that would be my > first suggestion. > > Peter > > > > "Be who you are and say what you feel because those who mind don't matter and those who matter don't mind." - Dr. Seuss, "Oh the Places You'll Go" ____________________________________________________________________________________ Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos. http://autos.yahoo.com/index.html From fredgca at hotmail.com Mon Oct 15 13:02:27 2007 From: fredgca at hotmail.com (Frederico Arnoldi) Date: Mon, 15 Oct 2007 13:02:27 +0000 Subject: [BioPython] where is SeqIO.parse()? In-Reply-To: References: Message-ID: Dear Kiradi, Concerning your subject question: where is SeqIO.parse()? >>> from Bio import SeqIO >>> SeqIO So, in my system, it is at /usr/lib/python2.4/site-packages/Bio/SeqIO/__init__.py. Try the same command in your python console and see where it is in yours. Concerning your problem: Try >>> from Bio import SeqIO >>> dir() ['SeqIO', '__builtins__', '__doc__', '__name__'] >>> dir(SeqIO) ['Alignment', 'ClustalIO', 'FastaIO', 'InsdcIO', 'Interfaces', 'NexusIO', 'PhylipIO', 'Seq', 'SeqRecord', 'StockholmIO', 'StringIO', 'SwissIO', '_FormatToIterator', '_FormatToWriter', '__builtins__', '__doc__', '__file__', '__name__', '__path__', 'generic_alphabet', 'generic_protein', 'os', 'parse', 'to_alignment', 'to_dict', 'write'] Do you get the same result? See that "parse" is in my SeqIO. Is it in yours? I noted that installing biopython via apt in Ubunutu, the __init__.py in Bio/SeqIO was empty. Maybe it is the source of your problem. But if I am right, when you type, in your system, dir(SeqIO), you get ['__builtins__', '__doc__', '__file__', '__name__', '__path__'], confirming your __init__.py is empty. Check it. If this is your problem, try installing biopyton by the tar.gz file available in Biopython home page. Good luck, Fred ---------------------------------------------------------------------->> Message: 1> Date: Wed, 10 Oct 2007 17:47:26 +0530> From: Prashantha Hebbar Kiradi > Subject: [BioPython] where is SeqIO.parse()?> To: biopython at biopython.org> Message-ID: > Content-Type: text/plain; charset=ISO-8859-1; format=flowed>> Hi everybody,>> While trying the example of 'Parsing sequence file formats' from section> 2.4 of Biopython tutorial:> -------------------------------------------------> from Bio import SeqIO> handle = open("ls_orchid.fasta")> for seq_record in SeqIO.parse(handle, "fasta") :> print seq_record.id> print seq_record.seq> print len(seq_record.seq)> handle.close()> ------------------------------------------------->>> I get this error:> -------------------------------------------------> Traceback (most recent call last):> File "fastEx.py", line 5, in > for seq_record in SeqIO.parse(handle, "fasta") :> AttributeError: 'module' object has no attribute 'parse'> ------------------------------------------------->> Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm> using is opening correctly.>> API documentation reports that the 'parse' function is there. What am I> doing wrong?>> I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.>> Thanks in advance,>> Prashantha Hebbar> Institute of Bioinformatics> ITPL, _________________________________________________________________ Receba as ?ltimas not?cias do Brasil e do mundo direto no seu Messenger com Alertas MSN! ? GR?TIS! http://alertas.br.msn.com/ From ytu888 at hotmail.com Mon Oct 15 16:19:47 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 15 Oct 2007 11:19:47 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> Message-ID: Hi Steve, Thank you for your email. I was away for a week. What do you mean "fresh" python prompt? I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded online. I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, am I right? Once again, thank you very much for your help.. > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Wed, 3 Oct 2007 10:47:41 -0400 > To: ytu888 at hotmail.com > > > Steve, thank you very much. It fixed the problem and I got through > > the build and install step. But when I tested inside the python for > > the installation I got following error. Please help me about it. > > Thanks. > > > > >>> import MySQLdb > > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ > > site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > > _mysql.py:3: UserWarning: Module _mysql was already imported from / > > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, > > but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to > > sys.path > > import sys, pkg_resources, imp > > Traceback (most recent call last): > > File "", line 1, in > > File "MySQLdb/__init__.py", line 19, in > > import _mysql > > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in > > > > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in > > __bootstrap__ > > ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / > > usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib > > Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so > > Reason: image not found > > > Sorry, don't know exactly what's happening here. Is this from a > "fresh" python prompt? > > How did you install MySQLdb, did you use easy_install? If so, try to > install from the sourceforge download. > > Try to remove it, remove the "build" directory from your mysqldb > download and redo the whole > python setup.py build / python setup.py install process > > To remove it, nuke this: > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg > > And try to reinstall? > > Perhaps someone who knows what the problem is here can give you a > better idea on what to do. > > -steve _________________________________________________________________ Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033 From lists.steve at arachnedesign.net Mon Oct 15 16:30:21 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Mon, 15 Oct 2007 12:30:21 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> Message-ID: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Hi, > Thank you for your email. I was away for a week. > What do you mean "fresh" python prompt? > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded > online. > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, > am I right? I'm not sure, exactly. Last time I checked, the only thing you needed to use mysql from python was: (a) A working mysql install (the client/server) (b) The mysqldb package from: http://sourceforge.net/projects/mysql- python I'm assuming (a) is installed correctly since you are using the .mpkg from mysql.org, so I'd just try to fix (b). You try do so by doing the following: (1) Remove your original attempt at installing the python mysqldb library. From the looks of your error messages, it seems to be installed here: Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ (2) remove the build directory in your mysqldb directory (the one you are installing from) by cd-ing into your mysqldb download, and removing the build directory you find there. (3) reinstall mysqldb by doing the usual `pythong setup.py build` and `sudo python setup.py install` dance For the record, I'm not sure what you are talking about when you are distinguishing between "MySQL_python_1.2.2, not MySQLdb" are you trying to install two python libraries to access mysql? -steve From ytu888 at hotmail.com Mon Oct 15 17:18:42 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 15 Oct 2007 12:18:42 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Message-ID: What I said: "MySQL_python_1.2.2, not MySQLdb" means to uninstall MySQL_python not the mysql client/server installed with the mpkg. I just deleted the MYSQL....fat.egg file and downloaded the MySAL-python-1.2.2.tar. I repeated the installation process. However, when I run import MySQLdb, I got the same error message. Is there any other things I should take a look? Thank you very much. CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 15 Oct 2007 12:30:21 -0400 > To: ytu888 at hotmail.com > > Hi, > > > Thank you for your email. I was away for a week. > > What do you mean "fresh" python prompt? > > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded > > online. > > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, > > am I right? > > I'm not sure, exactly. > > Last time I checked, the only thing you needed to use mysql from > python was: > > (a) A working mysql install (the client/server) > (b) The mysqldb package from: http://sourceforge.net/projects/mysql- > python > > I'm assuming (a) is installed correctly since you are using the .mpkg > from mysql.org, so I'd just try to fix (b). > > You try do so by doing the following: > > (1) Remove your original attempt at installing the python mysqldb > library. From the looks of your error messages, it seems to be > installed here: > > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > > (2) remove the build directory in your mysqldb directory (the one you > are installing from) by cd-ing into your mysqldb download, and > removing the build directory you find there. > > (3) reinstall mysqldb by doing the usual `pythong setup.py build` and > `sudo python setup.py install` dance > > For the record, I'm not sure what you are talking about when you are > distinguishing between "MySQL_python_1.2.2, not MySQLdb" > > are you trying to install two python libraries to access mysql? > > -steve > _________________________________________________________________ Boo!?Scare away worms, viruses and so much more! Try Windows Live OneCare! http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews From ytu888 at hotmail.com Tue Oct 16 17:06:36 2007 From: ytu888 at hotmail.com (Y Tu) Date: Tue, 16 Oct 2007 12:06:36 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Message-ID: Hi, I reinstalled everything and checked every step. I found that there are had some warnings in 'build" step (underlined) . I wonder if they are the reason why I got the error messages when running "import MySQLdb" under the python prompt and how to fix the problem. Thank you very much. LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python setup.py build running build running build_py ... ... /usr/bin/ld: for architecture ppc /usr/bin/ld: warning build/temp.macosx-10.3-fat-2.5/_mysql.o cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) /usr/bin/ld: warning /usr/local/mysql/lib/libmysqlclient_r.dylib cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ sudo python setup.py install Password: running install ... ... Adding MySQL-python 1.2.2 to easy-install.pth file Installed /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg Processing dependencies for MySQL-python==1.2.2 LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import MySQLdb /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to sys.path import sys, pkg_resources, imp Traceback (most recent call last): File "", line 1, in File "MySQLdb/__init__.py", line 19, in import _mysql File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in __bootstrap__ ImportError: dlopen(/Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib Referenced from: /Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so Reason: image not found > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 15 Oct 2007 12:30:21 -0400 > To: ytu888 at hotmail.com > > Hi, > > > Thank you for your email. I was away for a week. > > What do you mean "fresh" python prompt? > > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded > > online. > > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, > > am I right? > > I'm not sure, exactly. > > Last time I checked, the only thing you needed to use mysql from > python was: > > (a) A working mysql install (the client/server) > (b) The mysqldb package from: http://sourceforge.net/projects/mysql- > python > > I'm assuming (a) is installed correctly since you are using the .mpkg > from mysql.org, so I'd just try to fix (b). > > You try do so by doing the following: > > (1) Remove your original attempt at installing the python mysqldb > library. From the looks of your error messages, it seems to be > installed here: > > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > > (2) remove the build directory in your mysqldb directory (the one you > are installing from) by cd-ing into your mysqldb download, and > removing the build directory you find there. > > (3) reinstall mysqldb by doing the usual `pythong setup.py build` and > `sudo python setup.py install` dance > > For the record, I'm not sure what you are talking about when you are > distinguishing between "MySQL_python_1.2.2, not MySQLdb" > > are you trying to install two python libraries to access mysql? > > -steve > _________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us From fennan at gmail.com Tue Oct 16 17:51:30 2007 From: fennan at gmail.com (Fernando) Date: Tue, 16 Oct 2007 19:51:30 +0200 Subject: [BioPython] Precompute database information Message-ID: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> Hi everybody, I am thinking in including some algorithms that I work with into biopython. My first concern is that I'm using a local image of the Gene Ontology database to perform several operations. In order to avoid such database accesses I could precompute the information I need and load it once the module is called. How should I do it? Is there a guideline style to load external variables or something like that? Any other ideas/suggestions? Thanks From fennan at gmail.com Tue Oct 16 18:55:54 2007 From: fennan at gmail.com (Fernando) Date: Tue, 16 Oct 2007 20:55:54 +0200 Subject: [BioPython] Precompute database information In-Reply-To: <4714FD13.2020708@maubp.freeserve.co.uk> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> Message-ID: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> Hi Peter, >How big would your pre-computed data be? If its some sort of table or >other simple data you could perhaps use a simple text file; Another idea > for complicated objects is to use python's pickle module. It would be big... I an dealing with pairwise terms comparisons and I want to consider different species as well. >How often would the pre-computed data need to be updated? Every time >there is a new Gene Ontology release? It might be better have the >module download and cache the latest version on request (rather than >shipping an out of date dataset with Biopython). Yes, I could do that... It would be OK in Biopython to use mysql? If so the module could download the last GO version on request, install it and work with that version until the users decides to update it. On 10/16/07, Peter wrote: > > Fernando wrote: > > Hi everybody, > > > > I am thinking in including some algorithms that I work with into > biopython. > > My first concern is that I'm using a local image of the Gene Ontology > > database to perform several operations. In order to avoid such database > > accesses I could precompute the information I need and load it once the > > module is called. How should I do it? Is there a guideline style to load > > external variables or something like that? Any other ideas/suggestions? > > I think you need to go into more detail. > > How big would your pre-computed data be? If its some sort of table or > other simple data you could perhaps use a simple text file; Another idea > for complicated objects is to use python's pickle module. > > How often would the pre-computed data need to be updated? Every time > there is a new Gene Ontology release? It might be better have the > module download and cache the latest version on request (rather than > shipping an out of date dataset with Biopython). > > I don't think we have anything in Biopython that requires regular > updates. Things like genomes and sequence databases are left up to the > user. > > Peter > > From sdavis2 at mail.nih.gov Tue Oct 16 19:26:18 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 16 Oct 2007 15:26:18 -0400 Subject: [BioPython] Precompute database information In-Reply-To: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> Message-ID: <4715105A.30705@mail.nih.gov> Fernando wrote: > Hi Peter, > >> How big would your pre-computed data be? If its some sort of table or >> other simple data you could perhaps use a simple text file; Another idea >> for complicated objects is to use python's pickle module. > > It would be big... I an dealing with pairwise terms comparisons and I want > to consider different species as well. > >> How often would the pre-computed data need to be updated? Every time >> there is a new Gene Ontology release? It might be better have the >> module download and cache the latest version on request (rather than >> shipping an out of date dataset with Biopython). > > Yes, I could do that... It would be OK in Biopython to use mysql? If so the > module could download the last GO version on request, install it and work > with that version until the users decides to update it. Asking users to use MySQL to do updates might be a bit much. Could this be done from the .obo files? Sean From biopython at maubp.freeserve.co.uk Tue Oct 16 18:04:03 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 19:04:03 +0100 Subject: [BioPython] Precompute database information In-Reply-To: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> Message-ID: <4714FD13.2020708@maubp.freeserve.co.uk> Fernando wrote: > Hi everybody, > > I am thinking in including some algorithms that I work with into biopython. > My first concern is that I'm using a local image of the Gene Ontology > database to perform several operations. In order to avoid such database > accesses I could precompute the information I need and load it once the > module is called. How should I do it? Is there a guideline style to load > external variables or something like that? Any other ideas/suggestions? I think you need to go into more detail. How big would your pre-computed data be? If its some sort of table or other simple data you could perhaps use a simple text file; Another idea for complicated objects is to use python's pickle module. How often would the pre-computed data need to be updated? Every time there is a new Gene Ontology release? It might be better have the module download and cache the latest version on request (rather than shipping an out of date dataset with Biopython). I don't think we have anything in Biopython that requires regular updates. Things like genomes and sequence databases are left up to the user. Peter From fennan at gmail.com Wed Oct 17 11:12:36 2007 From: fennan at gmail.com (Fernando) Date: Wed, 17 Oct 2007 07:12:36 -0400 Subject: [BioPython] Precompute database information In-Reply-To: <4715105A.30705@mail.nih.gov> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> <4715105A.30705@mail.nih.gov> Message-ID: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> >Asking users to use MySQL to do updates might be a bit much. Could this >be done from the .obo files? I think that's probably the best solution... Is there any python module for working with OBO / OWL formats? I've been searching but people seem to use BioPerl for this matter On 10/16/07, Sean Davis wrote: > > Fernando wrote: > > Hi Peter, > > > >> How big would your pre-computed data be? If its some sort of table or > >> other simple data you could perhaps use a simple text file; Another > idea > >> for complicated objects is to use python's pickle module. > > > > It would be big... I an dealing with pairwise terms comparisons and I > want > > to consider different species as well. > > > >> How often would the pre-computed data need to be updated? Every time > >> there is a new Gene Ontology release? It might be better have the > >> module download and cache the latest version on request (rather than > >> shipping an out of date dataset with Biopython). > > > > Yes, I could do that... It would be OK in Biopython to use mysql? If so > the > > module could download the last GO version on request, install it and > work > > with that version until the users decides to update it. > > Asking users to use MySQL to do updates might be a bit much. Could this > be done from the .obo files? > > Sean > From sdavis2 at mail.nih.gov Wed Oct 17 15:34:17 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 17 Oct 2007 11:34:17 -0400 Subject: [BioPython] Precompute database information In-Reply-To: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> <4715105A.30705@mail.nih.gov> <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> Message-ID: <47162B79.8080204@mail.nih.gov> Fernando wrote: >>Asking users to use MySQL to do updates might be a bit much. Could this >>be done from the .obo files? > > I think that's probably the best solution... Is there any python module > for working with OBO / OWL formats? I've been searching but people seem > to use BioPerl for this matter In a way, it seems silly to reimplement the Bio::OntologyIO stuff in python, but I (and others, after a quick google search) would probably benefit from such a thing. I'm not able to devote much time right this minute to the project, but I think that, given the huge number of particularly obo format files available, there would be use for such parsers and tools in biopython. How much interest/need is there for a Bio.OntologyIO like thing? Has anyone made any attempts at creating one? For a list of available biologic ontologies (to see what we are missing), see here: http://obofoundry.org/ Sean From luca.beltrame at unimi.it Wed Oct 17 15:59:47 2007 From: luca.beltrame at unimi.it (Luca Beltrame) Date: Wed, 17 Oct 2007 17:59:47 +0200 Subject: [BioPython] Precompute database information In-Reply-To: <47162B79.8080204@mail.nih.gov> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> <47162B79.8080204@mail.nih.gov> Message-ID: <200710171759.48595.luca.beltrame@unimi.it> Il Wednesday 17 October 2007 17:34:17 Sean Davis ha scritto: > In a way, it seems silly to reimplement the Bio::OntologyIO stuff in It depends on the perspective, as for some learning yet another programming language would be a drawback. > parsers and tools in biopython. How much interest/need is there for a > Bio.OntologyIO like thing? Has anyone made any attempts at creating one? Personally speaking, I would love it. No time (and skill) to even think about doing something like that, though. -- Luca Beltrame, MSc. - Molecular Medicine PhD Student Dipartimento di Scienze e Tecnologie Biomediche - UniMI CNR - Institute of Biomedical Technologies Research Fellow E-mail: luca dot beltrame [at] unimi dot it - Phone: +39-02-50320924 From jimmy.musselwhite at gmail.com Wed Oct 17 21:20:41 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 17:20:41 -0400 Subject: [BioPython] Question about Seq.count() Message-ID: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> Hello all I have a script that is running through a list of about 250,000 sequence records and counting the number of times it counts substrings of 3-5 nucleotides in length Here is some example code search = 'ATTCG' #use SeqIO to get a big list of records sequences = list(SeqIO.parse(file, "fasta") for record in sequences : Now the code I want to do is record.seq.count(search) but what I am forced to do is record.seq.tostring().count(search) The problem here is that when I am forced to use .tostring() on every single seq object it devastates my memory usage in a BIG way. It eats up about 1.2gigs and then crashes. If I remove the .tostring() and just tell if to search for 'A', it will run fine and use memory at about 1/100th the rate So my question sums down to, is there any way to make .count() be able to search for strings and not just characters? Otherwise my work is going to grind to a halt here. Thanks! From biopython at maubp.freeserve.co.uk Wed Oct 17 22:03:51 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Oct 2007 23:03:51 +0100 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> Message-ID: <471686C7.6050305@maubp.freeserve.co.uk> Jimmy Musselwhite wrote: > Now the code I want to do is > record.seq.count(search) > > but what I am forced to do is > record.seq.tostring().count(search) > > The problem here is that when I am forced to use .tostring() on every single > seq object it devastates my memory usage in a BIG way. It eats up about > 1.2gigs and then crashes. If I remove the .tostring() and just tell if to > search for 'A', it will run fine and use memory at about 1/100th the rate In the short term, try record.seq.data.count(search) which is what the tostring() method is doing anyway (the Seq object stores the sequence internally as a string). Does that help? We might be tweaking the Seq object after the next release to act a bit more like a string - at which point the .data property might go away. > So my question sums down to, is there any way to make .count() be able to > search for strings and not just characters? You I'd never noticed that - I would call it a bug... >>> from Bio.Seq import Seq >>> my_seq = Seq("AAACACACGGTTTT") >>> my_seq.data.count("GG") 1 >>> my_seq.data.count("G") 2 >>> my_seq.tostring().count("G") 2 >>> my_seq.tostring().count("GG") 1 >>> my_seq.count("G") 2 >>> my_seq.count("GG") 0 Peter From jimmy.musselwhite at gmail.com Wed Oct 17 22:48:09 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 18:48:09 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <471686C7.6050305@maubp.freeserve.co.uk> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> Message-ID: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> Thanks guys! That worked great. On 10/17/07, Peter wrote: > > Jimmy Musselwhite wrote: > > Now the code I want to do is > > record.seq.count(search) > > > > but what I am forced to do is > > record.seq.tostring().count(search) > > > > The problem here is that when I am forced to use .tostring() on every > single > > seq object it devastates my memory usage in a BIG way. It eats up about > > 1.2gigs and then crashes. If I remove the .tostring() and just tell if > to > > search for 'A', it will run fine and use memory at about 1/100th the > rate > > In the short term, try record.seq.data.count(search) which is what the > tostring() method is doing anyway (the Seq object stores the sequence > internally as a string). Does that help? > > We might be tweaking the Seq object after the next release to act a bit > more like a string - at which point the .data property might go away. > > > So my question sums down to, is there any way to make .count() be able > to > > search for strings and not just characters? > > You I'd never noticed that - I would call it a bug... > > >>> from Bio.Seq import Seq > >>> my_seq = Seq("AAACACACGGTTTT") > >>> my_seq.data.count("GG") > 1 > >>> my_seq.data.count("G") > 2 > >>> my_seq.tostring().count("G") > 2 > >>> my_seq.tostring().count("GG") > 1 > >>> my_seq.count("G") > 2 > >>> my_seq.count("GG") > 0 > > Peter > > From jimmy.musselwhite at gmail.com Wed Oct 17 22:52:07 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 18:52:07 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> Message-ID: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> Just kidding, it didn't work great. It only "fixed" it because I was printing out the output of count() and so it was just executing 100 times slower and thus eating RAM 100 times slower :( It doesn't seem like there is a good way for me to fix this. On 10/17/07, Jimmy Musselwhite wrote: > > Thanks guys! That worked great. > > On 10/17/07, Peter wrote: > > > > Jimmy Musselwhite wrote: > > > Now the code I want to do is > > > record.seq.count(search) > > > > > > but what I am forced to do is > > > record.seq.tostring().count(search) > > > > > > The problem here is that when I am forced to use .tostring() on every > > single > > > seq object it devastates my memory usage in a BIG way. It eats up > > about > > > 1.2gigs and then crashes. If I remove the .tostring() and just tell if > > to > > > search for 'A', it will run fine and use memory at about 1/100th the > > rate > > > > In the short term, try record.seq.data.count (search) which is what the > > tostring() method is doing anyway (the Seq object stores the sequence > > internally as a string). Does that help? > > > > We might be tweaking the Seq object after the next release to act a bit > > more like a string - at which point the .data property might go away. > > > > > So my question sums down to, is there any way to make .count() be able > > to > > > search for strings and not just characters? > > > > You I'd never noticed that - I would call it a bug... > > > > >>> from Bio.Seq import Seq > > >>> my_seq = Seq("AAACACACGGTTTT") > > >>> my_seq.data.count("GG") > > 1 > > >>> my_seq.data.count("G") > > 2 > > >>> my_seq.tostring().count("G") > > 2 > > >>> my_seq.tostring().count("GG") > > 1 > > >>> my_seq.count("G") > > 2 > > >>> my_seq.count("GG") > > 0 > > > > Peter > > > > > From jimmy.musselwhite at gmail.com Wed Oct 17 23:04:26 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 19:04:26 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> Message-ID: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com> In response to the first reply you gave me, where you said this You I'd never noticed that - I would call it a bug... >>> from Bio.Seq import Seq >>> my_seq = Seq("AAACACACGGTTTT") >>> my_seq.data.count("GG") 1 >>> my_seq.data.count("G") 2 >>> my_seq.tostring().count("G") 2 >>> my_seq.tostring().count("GG") 1 >>> my_seq.count("G") 2 >>> my_seq.count("GG") 0 I've tried that many many times and I always get 0 when I do my_seq.count("GG") I just rebuilt biopython from the latest CVS tarball and it still does not work. I have no idea why yours works and mine doesn't. On 10/17/07, Jimmy Musselwhite wrote: > > Just kidding, it didn't work great. It only "fixed" it because I was > printing out the output of count() and so it was just executing 100 times > slower and thus eating RAM 100 times slower :( > > It doesn't seem like there is a good way for me to fix this. > > On 10/17/07, Jimmy Musselwhite wrote: > > > > Thanks guys! That worked great. > > > > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote: > > > > > > Jimmy Musselwhite wrote: > > > > Now the code I want to do is > > > > record.seq.count(search) > > > > > > > > but what I am forced to do is > > > > record.seq.tostring().count(search) > > > > > > > > The problem here is that when I am forced to use .tostring() on > > > every single > > > > seq object it devastates my memory usage in a BIG way. It eats up > > > about > > > > 1.2gigs and then crashes. If I remove the .tostring() and just tell > > > if to > > > > search for 'A', it will run fine and use memory at about 1/100th the > > > rate > > > > > > In the short term, try record.seq.data.count (search) which is what > > > the > > > tostring() method is doing anyway (the Seq object stores the sequence > > > internally as a string). Does that help? > > > > > > We might be tweaking the Seq object after the next release to act a > > > bit > > > more like a string - at which point the .data property might go away. > > > > > > > So my question sums down to, is there any way to make .count() be > > > able to > > > > search for strings and not just characters? > > > > > > You I'd never noticed that - I would call it a bug... > > > > > > >>> from Bio.Seq import Seq > > > >>> my_seq = Seq("AAACACACGGTTTT") > > > >>> my_seq.data.count("GG") > > > 1 > > > >>> my_seq.data.count("G") > > > 2 > > > >>> my_seq.tostring().count("G") > > > 2 > > > >>> my_seq.tostring().count("GG") > > > 1 > > > >>> my_seq.count("G") > > > 2 > > > >>> my_seq.count("GG") > > > 0 > > > > > > Peter > > > > > > > > > From jimmy.musselwhite at gmail.com Wed Oct 17 23:06:03 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 19:06:03 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com> Message-ID: <86e5e8970710171606x4ac9b3feg23f2409a4385d237@mail.gmail.com> Man I"m sorry, I didn't read that well enough. It doesn't work for you either. I'm gonna stop responding to this e-mail now :) I'm clearly tired or something. On 10/17/07, Jimmy Musselwhite wrote: > > In response to the first reply you gave me, where you said this > > You I'd never noticed that - I would call it a bug... > > >>> from Bio.Seq import Seq > >>> my_seq = Seq("AAACACACGGTTTT") > >>> my_seq.data.count("GG") > 1 > >>> my_seq.data.count("G") > 2 > >>> my_seq.tostring().count("G") > 2 > >>> my_seq.tostring().count("GG") > 1 > >>> my_seq.count("G") > 2 > >>> my_seq.count("GG") > 0 > > > I've tried that many many times and I always get 0 when I do > my_seq.count("GG") > I just rebuilt biopython from the latest CVS tarball and it still does not > work. I have no idea why yours works and mine doesn't. > > On 10/17/07, Jimmy Musselwhite wrote: > > > > Just kidding, it didn't work great. It only "fixed" it because I was > > printing out the output of count() and so it was just executing 100 times > > slower and thus eating RAM 100 times slower :( > > > > It doesn't seem like there is a good way for me to fix this. > > > > On 10/17/07, Jimmy Musselwhite < jimmy.musselwhite at gmail.com> wrote: > > > > > > Thanks guys! That worked great. > > > > > > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote: > > > > > > > > Jimmy Musselwhite wrote: > > > > > Now the code I want to do is > > > > > record.seq.count(search) > > > > > > > > > > but what I am forced to do is > > > > > record.seq.tostring().count(search) > > > > > > > > > > The problem here is that when I am forced to use .tostring() on > > > > every single > > > > > seq object it devastates my memory usage in a BIG way. It eats up > > > > about > > > > > 1.2gigs and then crashes. If I remove the .tostring() and just > > > > tell if to > > > > > search for 'A', it will run fine and use memory at about 1/100th > > > > the rate > > > > > > > > In the short term, try record.seq.data.count (search) which is what > > > > the > > > > tostring() method is doing anyway (the Seq object stores the > > > > sequence > > > > internally as a string). Does that help? > > > > > > > > We might be tweaking the Seq object after the next release to act a > > > > bit > > > > more like a string - at which point the .data property might go > > > > away. > > > > > > > > > So my question sums down to, is there any way to make .count() be > > > > able to > > > > > search for strings and not just characters? > > > > > > > > You I'd never noticed that - I would call it a bug... > > > > > > > > >>> from Bio.Seq import Seq > > > > >>> my_seq = Seq("AAACACACGGTTTT") > > > > >>> my_seq.data.count("GG") > > > > 1 > > > > >>> my_seq.data.count("G") > > > > 2 > > > > >>> my_seq.tostring().count("G") > > > > 2 > > > > >>> my_seq.tostring().count("GG") > > > > 1 > > > > >>> my_seq.count("G") > > > > 2 > > > > >>> my_seq.count("GG") > > > > 0 > > > > > > > > Peter > > > > > > > > > > > > > > From jimmy.musselwhite at gmail.com Thu Oct 18 12:48:41 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Thu, 18 Oct 2007 08:48:41 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <471733DE.6050803@maubp.freeserve.co.uk> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> <471733DE.6050803@maubp.freeserve.co.uk> Message-ID: <86e5e8970710180548u48e5780crc8d5178401d116d5@mail.gmail.com> Peter Well after a day of not thinking very hard I found my problem and it didn't have anything to do with strings at all. That was just my best guess at the time of writing this e-mail. Sorry about that =( On 10/18/07, Peter wrote: > > Jimmy Musselwhite wrote: > > Just kidding, it didn't work great. It only "fixed" it because I was > > printing out the output of count() and so it was just executing 100 > times > > slower and thus eating RAM 100 times slower :( > > > > It doesn't seem like there is a good way for me to fix this. > > Both of these are using the python string method to count "GG", the only > difference is the tostring() method has the additional small overhead of > an extra function call: > > my_seq.data.count("GG") > my_seq.tostring().count("GG") > > However, comparing these: > > my_seq.data.count("G") # using python's string count method > my_seq.tostring().count("G") # using python's string count method > my_seq.count("G") # using an iterator internally > > It could be that the Seq record's current single letter search is simply > very memory efficient compared than the python string's more flexible > multi-letter search. > > How are you measuring the RAM? If like to see memory usage figures for > the five simple examples above on a large sequence - plus doing this > directly on the equivalent string. > > Are you using Linux or Windows or Mac OS, and what version of python? I > know there have been some string optimisations in Python 2.5 (although I > don't know if any are relevant to the count method). > > Peter > > From ytu888 at hotmail.com Thu Oct 18 17:35:15 2007 From: ytu888 at hotmail.com (Y Tu) Date: Thu, 18 Oct 2007 12:35:15 -0500 Subject: [BioPython] Error for running the test code in BioSQL with Biopython manual In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Message-ID: I am still waiting for help to fix the problem on Mac (attached at the bottom). However, to make the project going I found a old PC and installed Python, MySQL, BioSql and Bio-python on it. However, when I tested the codes coming with Basic BioSQL with Biopython, I got the following error: =======================================my PC problem=============================== >>> from BioSQL import BioSeqDatabase >>> server=BioSeqDatabase.open_database(driver="MySQLdb", user="root", ... passwd="MySQLdb", host="localhost", db="bioseqdb") >>> db=server.new_database("Viral") >>> from Bio import GenBank >>> parser=GenBank.FeatureParser() >>> iterator = GenBank.Iterator(open("gbvrl.gb"), parser) >>> db.load(iterator) Traceback (most recent call last): File "", line 1, in File "C:\Python25\lib\site-packages\BioSQL\BioSeqDatabase.py", line 414, in lo ad db_loader.load_seqrecord(cur_record) File "C:\Python25\lib\site-packages\BioSQL\Loader.py", line 37, in load_seqrec ord bioentry_id = self._load_bioentry_table(record) File "C:\Python25\lib\site-packages\BioSQL\Loader.py", line 260, in _load_bioe ntry_table bioentry_id = self.adaptor.last_id('bioentry') File "C:\Python25\lib\site-packages\BioSQL\BioSeqDatabase.py", line 148, in la st_id return self.dbutils.last_id(self.cursor, table) File "C:\Python25\Lib\site-packages\BioSQL\DBUtils.py", line 34, in last_id return cursor.insert_id() AttributeError: 'Cursor' object has no attribute 'insert_id' +++++++++++++++++++++++++++++++++++++++++++++++++ Please help me to fix the problem, thanks. ========================================my old Mac problem======================== Date: Tue, 16 Oct 2007 12:06:36 -0500 From: Y Tu Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X To: Steve Lianoglou Cc: biopython at lists.open-bio.org Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi, I reinstalled everything and checked every step. I found that there are had some warnings in 'build" step (underlined) . I wonder if they are the reason why I got the error messages when running "import MySQLdb" under the python prompt and how to fix the problem. Thank you very much. LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python setup.py build running build running build_py ... ... /usr/bin/ld: for architecture ppc /usr/bin/ld: warning build/temp.macosx-10.3-fat-2.5/_mysql.o cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) /usr/bin/ld: warning /usr/local/mysql/lib/libmysqlclient_r.dylib cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ sudo python setup.py install Password: running install ... ... Adding MySQL-python 1.2.2 to easy-install.pth file Installed /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg Processing dependencies for MySQL-python==1.2.2 LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import MySQLdb /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to sys.path import sys, pkg_resources, imp Traceback (most recent call last): File "", line 1, in File "MySQLdb/__init__.py", line 19, in import _mysql File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in __bootstrap__ ImportError: dlopen(/Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib Referenced from: /Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so Reason: image not found _________________________________________________________________ Climb to the top of the charts!? Play Star Shuffle:? the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct From biopython at maubp.freeserve.co.uk Thu Oct 18 10:22:22 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Oct 2007 11:22:22 +0100 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> Message-ID: <471733DE.6050803@maubp.freeserve.co.uk> Jimmy Musselwhite wrote: > Just kidding, it didn't work great. It only "fixed" it because I was > printing out the output of count() and so it was just executing 100 times > slower and thus eating RAM 100 times slower :( > > It doesn't seem like there is a good way for me to fix this. Both of these are using the python string method to count "GG", the only difference is the tostring() method has the additional small overhead of an extra function call: my_seq.data.count("GG") my_seq.tostring().count("GG") However, comparing these: my_seq.data.count("G") # using python's string count method my_seq.tostring().count("G") # using python's string count method my_seq.count("G") # using an iterator internally It could be that the Seq record's current single letter search is simply very memory efficient compared than the python string's more flexible multi-letter search. How are you measuring the RAM? If like to see memory usage figures for the five simple examples above on a large sequence - plus doing this directly on the equivalent string. Are you using Linux or Windows or Mac OS, and what version of python? I know there have been some string optimisations in Python 2.5 (although I don't know if any are relevant to the count method). Peter From dalloliogm at gmail.com Fri Oct 19 13:38:50 2007 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 19 Oct 2007 15:38:50 +0200 Subject: [BioPython] Question about Seq.count() In-Reply-To: <471686C7.6050305@maubp.freeserve.co.uk> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> Message-ID: <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com> 2007/10/18, Peter : > >>> from Bio.Seq import Seq > >>> my_seq = Seq("AAACACACGGTTTT") > >>> my_seq.count("G") > 2 > >>> my_seq.count("GG") > 0 I've found the bug! The code for Bio.Seq.count is: def count(self, item): return len([x for x in self.data if x == item]) it does not work for patterns of two nucleotides, because '[x for x in self.data]' reiterates on a list of strings of one letter each: >>> s = Seq( 'ACTTgGCATYCGgtGACGACTGGGcATCGGTCAGTCGGTTT') >>> [x for x in s.data] ['A', 'C', 'T', 'T', 'g', 'G', 'C', 'A', 'T', 'Y', 'C', 'G', 'g', 't', 'G', 'A', 'C', 'G', 'A', 'C', 'T', 'G', 'G', 'G', 'c', 'A', 'T', 'C', 'G', 'G', 'T', 'C', 'A', 'G', 'T', 'C', 'G', 'G', 'T', 'T', 'T'] >>> for x in s.data: >>> print x, 'GG', x == 'GG' (always false) Something like [len('GG' in s.data)] also won't work, because "'GG' in s.data" returns a Boolean value: >>> 'GG' in s.data True What about using regular expressions instead? >>> import re >>> r = re.compile('GG') >>> count = len(r.findall(my_seq.data)) They don't seem to be too different as for the execution time: # for i in $( seq 10); do time python -m re -c '"cdasd".count("cc")'; done 2>&1| grep real real 0m0.091s real 0m0.106s real 0m0.081s real 0m0.110s real 0m0.076s real 0m0.109s real 0m0.109s real 0m0.062s real 0m0.110s real 0m0.062s # for i in $(seq 10); do time python -m re -c 'len(re.findall("cc", "cdasd"))'; done 2>&1|grep real real 0m0.065s real 0m0.108s real 0m0.079s real 0m0.082s real 0m0.111s real 0m0.113s real 0m0.110s real 0m0.112s real 0m0.112s real 0m0.111s Compiling a short pattern with the re module shouldn't take too much time and maybe in future implementations, it will allows us to do more interesting things: for example, we will be able to add an 'ignorecase' parameter to Seq.count: >>> Bio.Seq('ACAGtcAGgCATGCGG').count('GG', 'ignorecase') 2 >>> Bio.Seq('ACAGtcAGgCATGCGG').count('GG') 1 What do you think? Cheers, Giovanni -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com From biopython at maubp.freeserve.co.uk Fri Oct 19 14:50:56 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 19 Oct 2007 15:50:56 +0100 Subject: [BioPython] Question about Seq.count() In-Reply-To: <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com> Message-ID: <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com> > I've found the bug! > > The code for Bio.Seq.count is: > > def count(self, item): > return len([x for x in self.data if x == item]) Yeah - by design this (and the functionally similar version for the MutableSeq) both expect the count argument to be a single letter. The simple fix for the Seq object is to use the string method internally: def count(self, item): return self.data.count(item) For the MutableSeq things are not so straight forward, but supporting multiple character arguments can be done. > What about using regular expressions instead? > ... > What do you think? I think the Seq object's count method should act just like a normal python string's count method. If anyone wants to get fancy with regular expressions, they can do so. Peter From anaryin at gmail.com Mon Oct 22 12:21:49 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Mon, 22 Oct 2007 13:21:49 +0100 Subject: [BioPython] Scripts cannot connect Message-ID: Hello all! I solved my problem a few weeks ago on Windows but now that I've changed to Linux, it is back again. I have this script: #!/usr/bin/env python from SOAPpy import WSDL wsdl = 'http://soap.genome.jp/KEGG.wsdl' serv = WSDL.Proxy(wsdl) genes = ["eco:b1002", "eco:b2388"] results = serv.mark_pathway_by_objects("path:eco00010", genes) print results Everytime I try to run it, it gets me a timeout. I solved the problem in Windows by setting up env_variables. Here, the bash can access the web (it has its env_var http_proxy set) but my scripts can't.. any help? Thanks in advance! Jo?o Rodrigues From biopython at maubp.freeserve.co.uk Mon Oct 22 12:48:52 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Oct 2007 13:48:52 +0100 Subject: [BioPython] Scripts cannot connect In-Reply-To: References: Message-ID: <471C9C34.7000006@maubp.freeserve.co.uk> Jo?o Rodrigues wrote: > Everytime I try to run it, it gets me a timeout. I solved the problem in > Windows by setting up env_variables. Here, the bash can access the web (it > has its env_var http_proxy set) but my scripts can't.. any help? What does this do if you add it to your script? import os print os.environ.keys() try : print os.environ["http_proxy"] except KeyError : print "http_proxy environment variable not setup" How have you setup the environment variables in Linux? Via your .bashrc file? Peter From anaryin at gmail.com Mon Oct 22 13:11:46 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Mon, 22 Oct 2007 14:11:46 +0100 Subject: [BioPython] Scripts cannot connect In-Reply-To: <471C9C34.7000006@maubp.freeserve.co.uk> References: <471C9C34.7000006@maubp.freeserve.co.uk> Message-ID: Hello again! It says that the proxy isn't set.. I've added the line to my .bashrc ( I had to create it). Yet, it doesn't work. What am I doing wrong? (or not doing) From tiagoantao at gmail.com Mon Oct 22 14:01:53 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Mon, 22 Oct 2007 15:01:53 +0100 Subject: [BioPython] Scripts cannot connect In-Reply-To: References: <471C9C34.7000006@maubp.freeserve.co.uk> Message-ID: <471CAD51.101@gmail.com> Jo?o Rodrigues wrote: > It says that the proxy isn't set.. I've added the line to my .bashrc ( I had > to create it). Yet, it doesn't work. > > What am I doing wrong? (or not doing) Are you doing an export of the variable? Try doing env at the prompt and check if http_proxy is defined (you will get a big list of environment variables, just search or grep for the proxy one). Like: $ env | grep http_proxy On another front, your .bash_profile should exist and be sourcing .bashrc (either that, or you put http_proxy on .bash_profile) Regards, Tiago -- tiagoantao at gmail.com http://tiago.org/ps From anaryin at gmail.com Mon Oct 22 15:38:19 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Mon, 22 Oct 2007 16:38:19 +0100 Subject: [BioPython] Scripts cannot connect In-Reply-To: <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> References: <471C9C34.7000006@maubp.freeserve.co.uk> <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> Message-ID: Well, the problem is another then.. I've set the environment variables by hand and it worked. It detects the proxy and works through it. However, it still doesn't connect to the web. I'm using the example they gave on the KEGG API reference manual so it *should* work.. I've used a test script to check if other scripts could connect and they do. I've tried with the urllib to retrieve the kegg page and it does. I guess the problem is with the webservice... I'll try to figure it out. Thanks for your help! (Again :) ) From bsantos at biocant.pt Tue Oct 23 15:57:58 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 23 Oct 2007 16:57:58 +0100 Subject: [BioPython] Problems with NCBIXML.py Message-ID: <001101c8158d$7d146600$2300a8c0@bsantos> I am trying to build a simple script that given a multi FASTA sequence file perform a web BLAST and replace the name of the sequence by the hit with the lowest E-Value. But now I?m getting an exception that I don?t now why it?s happening: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 16, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in parse expat_parser.Parse(text, False) ExpatError: mismatched tag: line 2823, column 362 And where is my script: from Bio import SeqIO from Bio.Blast import NCBIWWW import cStringIO from Bio.Blast import NCBIXML #for file in dir file_handle = open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open file to an handler records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq Object save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w") for record in records: sequence = record.seq.data #Converts record to Plain Text result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a Blastn against the database nr blast_results = result_handle.read() #Catch the results save_file.write(blast_results) #Write all the information to an XML file result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml') blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: alignment = blast_record.alignments nIdent = (alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10 0.0 if nIdent >= 97: record.name = alignment[0].hit_def for record in records: print('>description_%s length_%d\n' % (record.name, len(record.seq))) print('%s\n' % record.seq) save_file.close() file_handle.close() Thank you, Bruno Santos From bsantos at biocant.pt Tue Oct 23 15:50:16 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 23 Oct 2007 16:50:16 +0100 Subject: [BioPython] Problems with NCBIXML.py Message-ID: <000c01c8158c$69ee0370$2300a8c0@bsantos> I am trying to build a simple script that given a multi FASTA sequence file perform a web BLAST and replace the name of the sequence by the hit with the lowest E-Value. But now I?m getting an exception that I don?t now why it?s happening: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 16, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in parse expat_parser.Parse(text, False) ExpatError: mismatched tag: line 2823, column 362 And where is my script: from Bio import SeqIO from Bio.Blast import NCBIWWW import cStringIO from Bio.Blast import NCBIXML #for file in dir file_handle = open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open file to an handler records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq Object save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w") for record in records: sequence = record.seq.data #Converts record to Plain Text result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a Blastn against the database nr blast_results = result_handle.read() #Catch the results save_file.write(blast_results) #Write all the information to an XML file result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml') blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: alignment = blast_record.alignments nIdent = (alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10 0.0 if nIdent >= 97: record.name = alignment[0].hit_def for record in records: print('>description_%s length_%d\n' % (record.name, len(record.seq))) print('%s\n' % record.seq) save_file.close() file_handle.close() Thank you, Bruno Santos From bsantos at biocant.pt Tue Oct 23 15:59:50 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 23 Oct 2007 16:59:50 +0100 Subject: [BioPython] Problems with NCBIXML.py Message-ID: <001601c8158d$bff07cc0$2300a8c0@bsantos> I am trying to build a simple script that given a multi FASTA sequence file perform a web BLAST and replace the name of the sequence by the hit with the lowest E-Value. But now I?m getting an exception that I don?t now why it?s happening: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 16, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in parse expat_parser.Parse(text, False) ExpatError: mismatched tag: line 2823, column 362 And where is my script: from Bio import SeqIO from Bio.Blast import NCBIWWW import cStringIO from Bio.Blast import NCBIXML #for file in dir file_handle = open(r'C:/FASTASeq/Results/Well9/assembled_file_well9_Dt_DIST.fna') #Open file to an handler records = SeqIO.parse(file_handle, format="fasta") #Store the file in a Seq Object save_file = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml', "w") for record in records: sequence = record.seq.data #Converts record to Plain Text result_handle = NCBIWWW.qblast("blastn", "nr", sequence) #Performs a Blastn against the database nr blast_results = result_handle.read() #Catch the results save_file.write(blast_results) #Write all the information to an XML file result_handle = open(r'C:/FASTASeq/Results/Well9/D1_Blast.xml') blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: alignment = blast_record.alignments nIdent = (alignment[0].hsps[0].positives/float(alignment[0].hsps[0].align_length))*10 0.0 if nIdent >= 97: record.name = alignment[0].hit_def for record in records: print('>description_%s length_%d\n' % (record.name, len(record.seq))) print('%s\n' % record.seq) save_file.close() file_handle.close() Thank you, Bruno Santos From bsantos at biocant.pt Tue Oct 23 17:17:24 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Tue, 23 Oct 2007 18:17:24 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <471E1CBC.30601@maubp.freeserve.co.uk> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> Message-ID: <001b01c81598$95f7b3b0$2300a8c0@bsantos> I have manually checked the file and I didn't found any problem. Sorry about the three times it was my mistake because I send the message before register and then I thought I had to send it again. This is getting stranger every time I ran the script it gave me a different error. Now I get this one at the first run: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 17, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in parse expat_parser.Parse("", True) # End of XML record ExpatError: unclosed token: line 2826, column 8 Now if I run the script without first close it I get the following error: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 17, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in parse expat_parser.Parse("", True) # End of XML record ExpatError: no element found: line 2823, column 81 Now if I execute the close operation on both files in the interactive window and run the script again I get: Traceback (most recent call last): File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "C:\Documents and Settings\POSTO_21\Os meus documentos\Meta Gen?mica\BLAST.py", line 17, in for blast_record in blast_records: File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 583, in parse expat_parser.Parse("", True) # End of XML record ExpatError: no element found: line 2827, column 0 I have upload my script, the FASTA file I'm using and the XML can anyone give a look? XML File: http://www.drivehq.com/folder/p2731454.aspx Script: http://www.drivehq.com/folder/p2731447.aspx FASTA File: http://www.drivehq.com/folder/p2731426.aspx Unidade de Bioinform?tica 3060-197 Cantanhede Tel: 231 410 892 http://bioinformatics.biocant.pt -----Mensagem original----- De: Peter [mailto:biopython at maubp.freeserve.co.uk] Enviada: ter?a-feira, 23 de Outubro de 2007 17:10 Para: Bruno Santos Cc: biopython at biopython.org Assunto: Re: [BioPython] Problems with NCBIXML.py Bruno Santos wrote: > I am trying to build a simple script that given a multi FASTA sequence file > perform a web BLAST and replace the name of the sequence by the hit with the > lowest E-Value. > > But now I?m getting an exception that I don?t now why it?s happening: > > Traceback (most recent call last): > ... > > for blast_record in blast_records: > > File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in > parse > > expat_parser.Parse(text, False) > > ExpatError: mismatched tag: line 2823, column 362 That sounds like an error in the XML file - have a look at this particular XML file by hand in a text editor; maybe its only a partial download, or an HTML error page or something. Peter From biopython at maubp.freeserve.co.uk Tue Oct 23 18:14:43 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Oct 2007 19:14:43 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <001b01c81598$95f7b3b0$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> Message-ID: <471E3A13.5080505@maubp.freeserve.co.uk> Bruno Santos wrote: > I have manually checked the file and I didn't found any problem. > Sorry about the three times it was my mistake because I send the message > before register and then I thought I had to send it again. > This is getting stranger every time I ran the script it gave me a different > error. Now I get this one at the first run: > > ... > > Now if I run the script without first close it I get the following error: > Traceback (most recent call last): > Without seeing the XML file I'm having to guess - but this could be something to do with trying to read files from disk before the OS has finished flushing the data out. Mismatched tags could certainly be explained if the parser was only getting part of the data. You could try inserting a sleep of a few seconds after writing and closing the XML file. Also try handle.flush() before the handle.close() when you save the XML file to disk. > I have upload my script, the FASTA file I'm using and the XML can anyone > give a look? > > XML File: http://www.drivehq.com/folder/p2731454.aspx > Script: http://www.drivehq.com/folder/p2731447.aspx > FASTA File: http://www.drivehq.com/folder/p2731426.aspx That didn't work - the easy solution is to file a bug, and then attach the three files: http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython Peter From dag23 at duke.edu Tue Oct 23 21:06:53 2007 From: dag23 at duke.edu (David Garfield) Date: Tue, 23 Oct 2007 17:06:53 -0400 Subject: [BioPython] Syntax error while parsing Blast output Message-ID: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> Hey list, I'm having an issue with the BlastParser and Iterator from NCBIStandalone. I assume its because NCBI has gone and changed the output file (again)...or I'm an idiot....but maybe there's a real problem here. I'm trying to parse a blast result using the following code: def filter_blast_results(blast_results, blast_cut_off): b_parser = NCBIStandalone.BlastParser() b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) hit_results = {} while 1: b_record = b_iterator.next() if b_record is None: break header = b_record.Header.query temp = [] for alignment in b_record.alignments: for hsp in alignment.hsps: if hsp.expect < blast_cut_off: temp.append(alignment.title) #we now remove duplicates from the temp list and add that the the hit_results hit_results[header] = remove_duplicates(temp) return hit_results And I get the error I've included at the bottom of this message, something about "SyntaxError: Line does not start with 'Reference':" I know that blast is working because I can print out what appears to my untrained eye to be a perfectly good XML of the results I see when I run blast manually. Any help would be very much appreciated, David Traceback (most recent call last): File "test_scripts.py", line 7, in single_blast_sequence.run_2way_blast('single_test_in.fasta','/ Users/dagarfield/urchins/blastdbs/urchin_2.0','/Users/dagarfield/ urchins/blastdbs/urchin_2.0','NA',.001,'/Users/dagarfield/urchins/ urchin_bin/blastall') File "/private/var/automount/Network/Share2/genomeScans/urchins/ alignment_methods/blast/single_blast_sequence.py", line 57, in run_2way_blast input_to_other_blast_matches = filter_blast_results (blast_results, blast_cut_off) File "/private/var/automount/Network/Share2/genomeScans/urchins/ alignment_methods/blast/single_blast_sequence.py", line 39, in filter_blast_results b_record = b_iterator.next() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 1403, in next return self._parser.parse(File.StringHandle(data)) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 616, in parse self._scanner.feed(handle, self._consumer) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 96, in feed self._scan_header(uhandle, consumer) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/Blast/NCBIStandalone.py", line 125, in _scan_header read_and_call(uhandle, consumer.reference, start='Reference') File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/site-packages/Bio/ParserSupport.py", line 300, in read_and_call raise SyntaxError, errmsg SyntaxError: Line does not start with 'Reference': /Users/dagarfield/urchins/blastdbs/urchin_2.0 From biopython at maubp.freeserve.co.uk Tue Oct 23 21:45:38 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Oct 2007 22:45:38 +0100 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> Message-ID: <471E6B82.5010700@maubp.freeserve.co.uk> David Garfield wrote: > Hey list, > > I'm having an issue with the BlastParser and Iterator from > NCBIStandalone. I assume its because NCBI has gone and changed the > output file (again)...or I'm an idiot....but maybe there's a real > problem here. The code you gave uses the NCBIStandalone parser/iterator, which expects plain text output - yet you say later the raw file looks like a perfectly good XML file. If you have an XML file (which we recommend over the plain text) then you should use the NCBIXML module instead. Also, a style point - I personally much prefer this: b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) for b_record in b_iterator : #etc over this: b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) while 1: b_record = b_iterator.next() if b_record is None: break #etc Peter From dag23 at duke.edu Tue Oct 23 21:59:33 2007 From: dag23 at duke.edu (David Garfield) Date: Tue, 23 Oct 2007 17:59:33 -0400 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <471E6B82.5010700@maubp.freeserve.co.uk> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> Message-ID: Thanks, Peter. You've found the problem exactly. Interestingly, the code I presented was taken directly from the BioPython cookbook (including the "while 1" bit). Somewhere in the subsequent versions since that document was released, the output of NCBIStandalone has changed from text to XML and the NCBIStandalone Iterators and Parser either no longer seem to work with the output of NCBIStandalone.blastall or there is an option not mentioned in the Cookbook to ensure that the output is in text rather than XML. In any event, the problem is now fixed. Thanks! --DG On Oct 23, 2007, at 5:45 PM, Peter wrote: > David Garfield wrote: >> Hey list, >> I'm having an issue with the BlastParser and Iterator from >> NCBIStandalone. I assume its because NCBI has gone and changed >> the output file (again)...or I'm an idiot....but maybe there's a >> real problem here. > > The code you gave uses the NCBIStandalone parser/iterator, which > expects plain text output - yet you say later the raw file looks > like a perfectly good XML file. If you have an XML file (which we > recommend over the plain text) then you should use the NCBIXML > module instead. > > Also, a style point - I personally much prefer this: > > b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) > for b_record in b_iterator : > #etc > > over this: > > b_iterator = NCBIStandalone.Iterator(blast_results, b_parser) > while 1: > b_record = b_iterator.next() > if b_record is None: break > #etc > > Peter > From biopython at maubp.freeserve.co.uk Tue Oct 23 22:48:28 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Oct 2007 23:48:28 +0100 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> Message-ID: <471E7A3C.5010301@maubp.freeserve.co.uk> David Garfield wrote: > Thanks, Peter. You've found the problem exactly. > > Interestingly, the code I presented was taken directly from the > BioPython cookbook (including the "while 1" bit). So it is. Michiel - do you fancy tweaking that section of the tutorial? > Somewhere in the subsequent versions since that document was released, > the output of NCBIStandalone has changed from text to XML and the > NCBIStandalone Iterators and Parser either no longer seem to work with > the output of NCBIStandalone.blastall or there is an option not > mentioned in the Cookbook to ensure that the output is in text rather > than XML. Biopython 1.43 switched the default from text to XML, because we really wanted to encourage people to use the XML output by default as maintaining the text format parser is such an ongoing maintainance effort. The release notes did mention this, but it was bound to catch someone out. There is an option to override this... from Bio.Blast import NCBIStandalone help(NCBIStandalone.blastall) You need the align_view option (what the NCBI refers to as the alignment view), corresponding to the -m command line option of the NCBI blastall tool. Biopython currently defaults to seven to get XML output. alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = query-anchored no identities and blunt ends, 6 = flat query-anchored, no identities and blunt ends, 7 = XML Blast output, 8 = tabular, 9 tabular with comment lines 10 ASN, text 11 ASN, binary [Integer] Peter From biopython at maubp.freeserve.co.uk Tue Oct 23 16:09:32 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Oct 2007 17:09:32 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <001101c8158d$7d146600$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> Message-ID: <471E1CBC.30601@maubp.freeserve.co.uk> Bruno Santos wrote: > I am trying to build a simple script that given a multi FASTA sequence file > perform a web BLAST and replace the name of the sequence by the hit with the > lowest E-Value. > > But now I?m getting an exception that I don?t now why it?s happening: > > Traceback (most recent call last): > ... > > for blast_record in blast_records: > > File "C:\Python25\lib\site-packages\Bio\Blast\NCBIXML.py", line 592, in > parse > > expat_parser.Parse(text, False) > > ExpatError: mismatched tag: line 2823, column 362 That sounds like an error in the XML file - have a look at this particular XML file by hand in a text editor; maybe its only a partial download, or an HTML error page or something. Peter From mdehoon at c2b2.columbia.edu Wed Oct 24 00:19:47 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 23 Oct 2007 20:19:47 -0400 Subject: [BioPython] Syntax error while parsing Blast output References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> > > Interestingly, the code I presented was taken directly from the > > BioPython cookbook (including the "while 1" bit). > > So it is. Michiel - do you fancy tweaking that section of the tutorial? That part of the tutorial is in the section "Deprecated BLAST parsers", which will be removed once the plain-text Blast parser is removed from Biopython. The description of NCBIStandalone.blastall says "This command will generate BLAST output in XML format, ..." So this is being described correctly in the documentation. Nevertheless, it may be a good idea to remove the plain text Blast parser completely from Biopython in the upcoming release (which will probably be done this week), to avoid further confusion. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Peter Sent: Tue 10/23/2007 6:48 PM To: David Garfield; biopython at lists.open-bio.org Subject: Re: [BioPython] Syntax error while parsing Blast output David Garfield wrote: > Thanks, Peter. You've found the problem exactly. > > Somewhere in the subsequent versions since that document was released, > the output of NCBIStandalone has changed from text to XML and the > NCBIStandalone Iterators and Parser either no longer seem to work with > the output of NCBIStandalone.blastall or there is an option not > mentioned in the Cookbook to ensure that the output is in text rather > than XML. Biopython 1.43 switched the default from text to XML, because we really wanted to encourage people to use the XML output by default as maintaining the text format parser is such an ongoing maintainance effort. The release notes did mention this, but it was bound to catch someone out. There is an option to override this... from Bio.Blast import NCBIStandalone help(NCBIStandalone.blastall) You need the align_view option (what the NCBI refers to as the alignment view), corresponding to the -m command line option of the NCBI blastall tool. Biopython currently defaults to seven to get XML output. alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = query-anchored no identities and blunt ends, 6 = flat query-anchored, no identities and blunt ends, 7 = XML Blast output, 8 = tabular, 9 tabular with comment lines 10 ASN, text 11 ASN, binary [Integer] Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Wed Oct 24 08:22:45 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Oct 2007 09:22:45 +0100 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> Message-ID: <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> [Sorry you got this twice Michiel, I forgot to set the from/to fields] > That part of the tutorial is in the section "Deprecated BLAST parsers", which > will be removed once the plain-text Blast parser is removed from Biopython. > ... > Nevertheless, it may be a good idea to remove the plain text Blast parser > completely from Biopython in the upcoming release (which will probably be > done this week), to avoid further confusion. Removing it sounds too drastic - especially as we have had people on the mailing list using it deliberately fairly recently. If you really do want to remove this code, then adding a deprecation warning to the plain text parser for the next release would be a more gentle route. I think there is still some benefit in having the plain text parser, and that it could be fixed to cope with current multi-query files without too much pain. Maybe I should try this weekend... Anyone want to voice their opinion? Peter From mmokrejs at ribosome.natur.cuni.cz Wed Oct 24 11:01:26 2007 From: mmokrejs at ribosome.natur.cuni.cz (=?UTF-8?B?TWFydGluIE1PS1JFSsWg?=) Date: Wed, 24 Oct 2007 13:01:26 +0200 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> Message-ID: <471F2606.8080500@ribosome.natur.cuni.cz> Hi, Michiel De Hoon wrote: >>> Interestingly, the code I presented was taken directly from the >>> BioPython cookbook (including the "while 1" bit). >> So it is. Michiel - do you fancy tweaking that section of the tutorial? > > That part of the tutorial is in the section "Deprecated BLAST parsers", which > will be removed once the plain-text Blast parser is removed from Biopython. > The description of NCBIStandalone.blastall says > > "This command will generate BLAST output in XML format, ..." > > So this is being described correctly in the documentation. > > Nevertheless, it may be a good idea to remove the plain text Blast parser > completely from Biopython in the upcoming release (which will probably be > done this week), to avoid further confusion. although I understand your points, are you sure to REMOVE it? What if people need to parse elsewhere generated, maybe even in the past generated BLAST text outputs? If you wanted to say that you will REMOVE the text-based parser because it won't be maintained anymore and probably be usable for one or two NCBI BLAST version only, then it is probably more understandable. Otherwise I guess more people move to bioperl. ;) BTW, what if some people have older BLAST version generating broken XML file formats? Or have to parse such old files again? Martin From winter at biotec.tu-dresden.de Wed Oct 24 12:22:09 2007 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 24 Oct 2007 14:22:09 +0200 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu><471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> Message-ID: <471F38F1.1030600@biotec.tu-dresden.de> Michiel De Hoon wrote: > Nevertheless, it may be a good idea to remove the plain text Blast parser > completely from Biopython in the upcoming release (which will probably be > done this week), to avoid further confusion. I agree with Peter and Martin that removing the plain text parser is maybe too much. Although I further agree that there is benefit in having the plain text parser, I am not sure if Biopython should ensure supporting every small format change that NCBI might come up with in the future. I use XML and tabular output only, BTW. Cheers, Christof From cjfields at uiuc.edu Wed Oct 24 13:49:09 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 24 Oct 2007 08:49:09 -0500 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> Message-ID: <3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu> On Oct 24, 2007, at 3:22 AM, Peter wrote: > [Sorry you got this twice Michiel, I forgot to set the from/to fields] > >> That part of the tutorial is in the section "Deprecated BLAST >> parsers", which >> will be removed once the plain-text Blast parser is removed from >> Biopython. >> ... >> Nevertheless, it may be a good idea to remove the plain text Blast >> parser >> completely from Biopython in the upcoming release (which will >> probably be >> done this week), to avoid further confusion. > > Removing it sounds too drastic - especially as we have had people on > the mailing list using it deliberately fairly recently. If you > really do want > to remove this code, then adding a deprecation warning to the plain > text > parser for the next release would be a more gentle route. > > I think there is still some benefit in having the plain text > parser, and that > it could be fixed to cope with current multi-query files without > too much > pain. Maybe I should try this weekend... > > Anyone want to voice their opinion? > > Peter We have a similar issue with the bioperl parsers. We basically promote the BLAST XML parser over the text parser, but we have retained both due to demand. In fact, we have two text parsers, a pull and a push parser (we're gluttons for punishment). As for maintenance, we never guarantee how long it will take to fix text parsing if it breaks as the text format is fairly unstable by NCBI's own admission. Our deprecation cycle is usually: (1) announce it on list to get feedback, (2) if deprecation is planned, add warnings to the module in the next release, (3) remove completely in a later release. It gives everyone time to change over. chris From bsantos at biocant.pt Wed Oct 24 16:23:56 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 24 Oct 2007 17:23:56 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <471E3A13.5080505@maubp.freeserve.co.uk> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> Message-ID: <001601c8165a$48248600$2300a8c0@bsantos> Peter Wrote: >Without seeing the XML file I'm having to guess - but this could be >something to do with trying to read files from disk before the OS has >finished flushing the data out. Mismatched tags could certainly be >explained if the parser was only getting part of the data. > >You could try inserting a sleep of a few seconds after writing and >closing the XML file. Also try handle.flush() before the handle.close() >when you save the XML file to disk. You were right I was getting the data before it has been written to the file. Now it's working perfect. But know I have another problem it's possible to instead of making a single request to NCBI_Blast with one sequence, make the request for all the sequences in a multiFASTA file? I'm trying to use threads to do this but until now without luck. Thanks in advance, Bruno Santos From biopython at maubp.freeserve.co.uk Wed Oct 24 17:32:52 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Oct 2007 18:32:52 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <001601c8165a$48248600$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> Message-ID: <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> On 10/24/07, Bruno Santos wrote: > You were right I was getting the data before it has been written to the > file. Now it's working perfect. Great. > But know I have another problem it's possible to instead of making a single > request to NCBI_Blast with one sequence, make the request for all the > sequences in a multiFASTA file? > > I'm trying to use threads to do this but until now without luck. I would suggest you install standalone blast, then give it the multi-record FASTA file as input. You should then get multiple blast records back (in the same order). This works fine with the XML output (but currently does not work for plain text output on recent versions of NCBI Blast). If you really want to make multiple blast submissions in parallel online, first check the NCBI's website for any usage restrictions - they don't want their servers to be abused. Peter From biosql at hotmail.com Wed Oct 24 20:53:19 2007 From: biosql at hotmail.com (Jonathan Boulais) Date: Wed, 24 Oct 2007 16:53:19 -0400 Subject: [BioPython] Loading SwissProt to BioSQL Message-ID: Hello, I'm a biologist and quite newb with Biopython. I'm trying to build locally the Swissprot database with BioSQL and I'm having some problems. I have installed the latest version from the CVS and I'm using python 2.5 on a Mac Os 10.4. First, i get this weird problem. Since I need to connect with MySQL I started to wrote a simple script (Biosql.py) with only this ( from BioSQL import BioSeqDatabase). When I run this script in the terminal : python Biosql.py, I get this message **ImportError: cannot import name BioSeqDatabase**. But the weird thing is if I start a python session in the terminal by simply invoking python and then manually import BioSeqDatabase, it's working ! Is there any reason for that ? Second, I've then decided to continue with the python session since I'm able to import BioSeqDatabse. The connection to MySQL is working fine, but when I'm trying to import the flat file I'm getting this : Traceback (most recent call last): File "", line 1, in File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load db_loader.load_seqrecord(cur_record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table version)) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute self.cursor.execute(sql, args or ()) File "/sw/lib/python2.5/site-packages/MySQLdb/cursors.py", line 151, in execute query = query % db.literal(args) TypeError: not all arguments converted during string formatting Here's the lines I'm using : from BioSQL import BioSeqDatabase from Bio.SwissProt import SProt server = BioSeqDatabase.open_database(driver = "MySQLdb", user = "", passwd = "", host = "localhost", db = "bioseqdb") s_parser = SProt.SequenceParser() s_iterator = SProt.Iterator(open("path to/uniprot_sprot.dat", "r"), s_parser) db = server.new_database("Swiss") db.load(s_iterator) Does anybody understand this ? Many thanks if someone can help ! Jonathan _________________________________________________________________ Envoie un sourire, fais rire, amuse-toi! Employez-le maintenant! http://www.emoticonesgratuites.ca/?icid=EMFRCA120 From biopython at maubp.freeserve.co.uk Wed Oct 24 21:15:10 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Oct 2007 22:15:10 +0100 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: References: Message-ID: <471FB5DE.6080506@maubp.freeserve.co.uk> Jonathan Boulais wrote: > Hello, > > I'm a biologist and quite newb with Biopython. I'm trying to build > locally the Swissprot database with BioSQL and I'm having some > problems. I have installed the latest version from the CVS and I'm > using python 2.5 on a Mac Os 10.4. > > First, i get this weird problem. Since I need to connect with MySQL I > started to wrote a simple script (Biosql.py) with only this ( from > BioSQL import BioSeqDatabase). When I run this script in the > terminal: python Biosql.py, I get this message **ImportError: cannot > import name BioSeqDatabase**. But the weird thing is if I start a > python session in the terminal by simply invoking python and then > manually import BioSeqDatabase, it's working ! Is there any reason > for that ? In both cases are you running python from the command prompt? If so then the same environment variables (e.g. paths) should apply. Odd. My guess is you shouldn't call your script "Biosql.py", call it "Biosql_test.py" or something. Python thinks the line "from BioSQL import BioSeqDatabase" means importing from the script itself because that is also called BioSQL. Peter From biopython at maubp.freeserve.co.uk Wed Oct 24 21:22:05 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Oct 2007 22:22:05 +0100 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: References: Message-ID: <471FB77D.5060103@maubp.freeserve.co.uk> Jonathan Boulais wrote: > from Bio.SwissProt import SProt > s_parser = SProt.SequenceParser() > s_iterator = SProt.Iterator(open("path to/uniprot_sprot.dat", "r"), s_parser) This won't help with the database issue, but you should also be able to load the SwissProt text file with Bio.SeqIO: from Bio import SeqIO s_iterator = SeqIO.parse(open("path/to/uniprot_sprot.dat"), "swiss") This in fact will call the Bio.SwissProt.SProt module internally, and get it to return SeqRecord objects. The Bio.SeqIO interface is meant to make it easy to switch the input file format (e.g. GenBank or EMBL). Peter From mdehoon at c2b2.columbia.edu Thu Oct 25 00:40:18 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 24 Oct 2007 20:40:18 -0400 Subject: [BioPython] Syntax error while parsing Blast output References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> <3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu> >> Nevertheless, it may be a good idea to remove the plain text Blast >> parser >> completely from Biopython in the upcoming release (which will >> probably be >> done this week), to avoid further confusion. > > Removing it sounds too drastic - especially as we have had people on > the mailing list using it deliberately fairly recently. If you > really do want > to remove this code, then adding a deprecation warning to the plain > text > parser for the next release would be a more gentle route. > Sorry, I was confused; I was under the impression that the plain text Blast parser was already deprecated (I was getting confused with the blast and blasturl functions in Bio.Blast.NCBIWWW, which are already deprecated in favor of qblast). OK, then let's keep the plain-text Blast parser as is, and maybe think again about this issue after the upcoming release. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mmayhew at mcb.mcgill.ca Thu Oct 25 04:12:06 2007 From: mmayhew at mcb.mcgill.ca (Michael Mayhew) Date: Thu, 25 Oct 2007 00:12:06 -0400 Subject: [BioPython] Any planned BioPython presence at PyCon 2008? Message-ID: <47201796.2050902@mcb.mcgill.ca> Was planning on going to PyCon 2008 anyway, but would have even more incentive if there is going to be a big BioPython community turnout. Would love to pitch in on a development session or something like that. Michael Mayhew From biosql at hotmail.com Thu Oct 25 14:52:02 2007 From: biosql at hotmail.com (Jonathan Boulais) Date: Thu, 25 Oct 2007 10:52:02 -0400 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: <471FB5DE.6080506@maubp.freeserve.co.uk> References: <471FB5DE.6080506@maubp.freeserve.co.uk> Message-ID: > Date: Wed, 24 Oct 2007 22:15:10 +0100 > From: biopython at maubp.freeserve.co.uk > To: biosql at hotmail.com; biopython at lists.open-bio.org > Subject: Re: [BioPython] Loading SwissProt to BioSQL > > Jonathan Boulais wrote: > > Hello, > > > > I'm a biologist and quite newb with Biopython. I'm trying to build > > locally the Swissprot database with BioSQL and I'm having some > > problems. I have installed the latest version from the CVS and I'm > > using python 2.5 on a Mac Os 10.4. > > > > First, i get this weird problem. Since I need to connect with MySQL I > > started to wrote a simple script (Biosql.py) with only this ( from > > BioSQL import BioSeqDatabase). When I run this script in the > > terminal: python Biosql.py, I get this message **ImportError: cannot > > import name BioSeqDatabase**. But the weird thing is if I start a > > python session in the terminal by simply invoking python and then > > manually import BioSeqDatabase, it's working ! Is there any reason > > for that ? > > In both cases are you running python from the command prompt? If so > then the same environment variables (e.g. paths) should apply. Odd. > > My guess is you shouldn't call your script "Biosql.py", call it > "Biosql_test.py" or something. Python thinks the line "from BioSQL > import BioSeqDatabase" means importing from the script itself because > that is also called BioSQL. > > Peter > Peter you were right about the name of the file. Nice call and thank you ! But I still get the same error as before when I'm running it. Traceback (most recent call last): File "DB.py", line 14, in db.load(s_iterator) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 414, in load db_loader.load_seqrecord(cur_record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 30, in load_seqrecord bioentry_id = self._load_bioentry_table(record) File "/sw/lib/python2.5/site-packages/BioSQL/Loader.py", line 250, in _load_bioentry_table version)) File "/sw/lib/python2.5/site-packages/BioSQL/BioSeqDatabase.py", line 277, in execute self.cursor.execute(sql, args or ()) File "/sw/lib/python2.5/site-packages/MySQLdb/cursors.py", line 151, in execute query = query % db.literal(args) TypeError: not all arguments converted during string formatting Is it the MySQLdb driver or a bad arguments that is passed to MySQLdb ? Again, thank you for your time. Jonathan _________________________________________________________________ Envoie un sourire, fais rire, amuse-toi! Employez-le maintenant! http://www.emoticonesgratuites.ca/?icid=EMFRCA120 From biopython at maubp.freeserve.co.uk Thu Oct 25 17:22:46 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Oct 2007 18:22:46 +0100 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: References: <471FB5DE.6080506@maubp.freeserve.co.uk> Message-ID: <4720D0E6.8000609@maubp.freeserve.co.uk> Jonathan Boulais wrote: >> My guess is you shouldn't call your script "Biosql.py", call it >> "Biosql_test.py" or something. Python thinks the line "from BioSQL >> import BioSeqDatabase" means importing from the script itself because >> that is also called BioSQL. > > Peter you were right about the name of the file. Nice call and thank you ! Great - I wasn't sure if the case would matter or not. > But I still get the same error as before when I'm running it. > ... I've not used BioSQL myself (yet), but looking at the code you posted earlier, you setup the connection like this: from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="MySQLdb", user="", passwd="", host="localhost", db="bioseqdb") I think the driver="MySQLdb" is fine, but don't you need a database username (and perhaps a password)? Peter From biopython at maubp.freeserve.co.uk Thu Oct 25 09:44:43 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Oct 2007 10:44:43 +0100 Subject: [BioPython] Any planned BioPython presence at PyCon 2008? In-Reply-To: <47201796.2050902@mcb.mcgill.ca> References: <47201796.2050902@mcb.mcgill.ca> Message-ID: <4720658B.4020103@maubp.freeserve.co.uk> Michael Mayhew wrote: > Was planning on going to PyCon 2008 anyway, but would have even more > incentive if there is going to be a big BioPython community turnout. > > Would love to pitch in on a development session or something like that. > > Michael Mayhew http://us.pycon.org/2008/about/ http://pycon.blogspot.com/2007/10/call-for-talk-tutorial-proposals.html > Proposals for PyCon 2008 talks & tutorials are now being accepted. > The deadline for proposals is November 16. PyCon 2008 will be held > in Chicago, Illinois, USA, from March 13-20. It is remotely possible that I'll be working the USA next year, but I have to say at this point that it looks unlikely that I'll be able to attend. Peter From biopython at maubp.freeserve.co.uk Thu Oct 25 09:57:10 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Oct 2007 10:57:10 +0100 Subject: [BioPython] Syntax error while parsing Blast output In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu> References: <072FE6F3-B60B-466D-93E7-81F37D2C4EC2@duke.edu> <471E6B82.5010700@maubp.freeserve.co.uk> <471E7A3C.5010301@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B63F@mail2.exch.c2b2.columbia.edu> <320fb6e00710240122q53d099ax6b295f0f7d6f9174@mail.gmail.com> <3462123A-662F-4BBC-ADE4-3F5967760F6E@uiuc.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B642@mail2.exch.c2b2.columbia.edu> Message-ID: <47206876.9040905@maubp.freeserve.co.uk> Michiel De Hoon wrote: > > Sorry, I was confused; I was under the impression that the plain text Blast > parser was already deprecated (I was getting confused with the blast and > blasturl functions in Bio.Blast.NCBIWWW, which are already deprecated in > favor of qblast). OK, then let's keep the plain-text Blast parser as is, and > maybe think again about this issue after the upcoming release. > Panic averted - but it was good to hear some passionate defence of the plain text BLAST parser, it looks like it still gets quite a bit of use. Peter From bsantos at biocant.pt Fri Oct 26 09:13:58 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Fri, 26 Oct 2007 10:13:58 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> Message-ID: <000301c817b0$8c868c10$2300a8c0@bsantos> Peter Said >I would suggest you install standalone blast, then give it the >multi-record FASTA file as input. You should then get multiple blast >records back (in the same order). This works fine with the XML output >(but currently does not work for plain text output on recent versions >of NCBI Blast). > >If you really want to make multiple blast submissions in parallel >online, first check the NCBI's website for any usage restrictions - >they don't want their servers to be abused. > >Peter I have followed your advice and I decide to install standalone blast. As I want to make blast against the nt databases I have downloaded it pre compiled from the ncbi ftp server. And I have created I scrip to do this but for some reason I'm not getting any results, because the programs does not write anything to the XML file. Where is my script: from Bio import SeqIO from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML import time import math my_blast_db = (r'e:/nt.00') my_blast_file = r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna' my_blast_exe = r'C:/BLAST/bin/' save_file = open(r'C:/FASTASeq/Results/well9/V6_BLAST.xml', 'w') result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file) blast_results = result_handle.read() #Catch the results save_file.write(blast_results) #Write all the information to an XML file save_file.close() print time.ctime() As I have download the files from ncbi I have a lot of files in the database directory theres is any way of perform a search against all of them? Thanks in advance, Bruno Santos Unidade de Bioinform?tica 3060-197 Cantanhede Tel: 231 410 892 http://bioinformatics.biocant.pt From biopython at maubp.freeserve.co.uk Fri Oct 26 09:52:34 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Oct 2007 10:52:34 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <000301c817b0$8c868c10$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> Message-ID: <4721B8E2.2040902@maubp.freeserve.co.uk> Bruno Santos wrote: > Peter Said >> I would suggest you install standalone blast, then give it the >> multi-record FASTA file as input. You should then get multiple blast >> records back (in the same order). This works fine with the XML output >> (but currently does not work for plain text output on recent versions >> of NCBI Blast). >> >> If you really want to make multiple blast submissions in parallel >> online, first check the NCBI's website for any usage restrictions - >> they don't want their servers to be abused. >> >> Peter > > I have followed your advice and I decide to install standalone blast. As I > want to make blast against the nt databases I have downloaded it pre > compiled from the ncbi ftp server. And I have created I script to do this but > for some reason I'm not getting any results, because the programs does not > write anything to the XML file. > > Where is my script: > from Bio import SeqIO > from Bio.Blast import NCBIStandalone > from Bio.Blast import NCBIXML > import time > import math You are running on Windows, so the paths should have "\" rather than "/" in them. However, in many cases this isn't essential - and indeed for some Unix programs ported to Windows using "/" is sometimes best! > my_blast_db = (r'e:/nt.00') I'm not sure if that is correct, but its difficult to tell without seeing your setup. > my_blast_file = > r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna' > my_blast_exe = r'C:/BLAST/bin/' That is wrong, try something like: my_blast_exe = r'C:\BLAST\bin\blastall.exe' I would urge you to try running blastall "by hand" at the command line first for a few small examples, to get the hang of it. Because any error messages get printed to the command line, it makes debugging simpler. This will also help with you how to prepare the arguments in Biopython. Within python you would have to have checked what was written to the error_info output handle. > As I have download the files from ncbi I have a lot of files in the database > directory theres is any way of perform a search against all of them? I'm not sure what exactly you are asking. BLAST can make databases from FASTA files, so you might want to build a database from all your FASTA files... check the documentation for the BLAST formatdb program. Peter From bsantos at biocant.pt Fri Oct 26 13:40:40 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Fri, 26 Oct 2007 14:40:40 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <4721B8E2.2040902@maubp.freeserve.co.uk> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> <4721B8E2.2040902@maubp.freeserve.co.uk> Message-ID: <000701c817d5$d0e8f4e0$2300a8c0@bsantos> >You are running on Windows, so the paths should have "\" rather than "/" >in them. However, in many cases this isn't essential - and indeed for >some Unix programs ported to Windows using "/" is sometimes best! > > my_blast_db = (r'e:/nt.00') > >I'm not sure if that is correct, but its difficult to tell without >seeing your setup. It's ok to use the "/" because it seems that the python interpreter converts it to the symbol used by the OS. > my_blast_file = > r'C:/FASTASeq/Results/well9/assembled_file_well9_V6_DIST.fna' > my_blast_exe = r'C:/BLAST/bin/' > >That is wrong, try something like: >my_blast_exe = r'C:\BLAST\bin\blastall.exe' You were right about that. It's ok now > As I have download the files from ncbi I have a lot of files in the database > directory theres is any way of perform a search against all of them? >I'm not sure what exactly you are asking. BLAST can make databases from >FASTA files, so you might want to build a database from all your FASTA >files... check the documentation for the BLAST formatdb program. I have downloaded the pre compiled files which mean I have five different files like (nt.00.nhr, nt.01.nhr, nt.02.nhr...) and also the same files with all the others extensions. But I have found I can use them all at the same time by passing it to command line between "". So now I have my_blast_db = (r'\"e:/nt.00 e:/nt.01 e:/nt.02 e:/nt.03 e:/nt.04 e:/nt.05 \"'). But now I'm mailing you with another doubt it is possible to pass the result_handle to blast_results line by line or something like that because I'm having a memory error in the step described below result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file) blast_results = result_handle.read() #Catch the results Maybe if I pass one line at a time and write ir immediately to the xml file it will work. Thanks once more, Bruno Santos From biopython at maubp.freeserve.co.uk Fri Oct 26 14:37:45 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Oct 2007 15:37:45 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <000701c817d5$d0e8f4e0$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> <4721B8E2.2040902@maubp.freeserve.co.uk> <000701c817d5$d0e8f4e0$2300a8c0@bsantos> Message-ID: <4721FBB9.1040408@maubp.freeserve.co.uk> > But now I'm mailing you with another doubt it is possible to pass the > result_handle to blast_results line by line or something like that because > I'm having a memory error in the step described below > > result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, > "blastn",my_blast_db, my_blast_file) > blast_results = result_handle.read() #Catch the results > > Maybe if I pass one line at a time and write it immediately to the xml file > it will work. XML files are big. Lots of query sequences will also make things bigger. And the default expectation threshold will also give lots of results - setting this to something harsher will help by giving less matches. Unless you want to keep the XML file for other analysis, it might be simpler to parse the output from blast directly with Biopython - avoiding having the large XML file on disk. Keeping the XML intermediate file can be a good idea when working on smaller datasets, where you want to tweak your analysis (without re-running blast each time). Peter From bsantos at biocant.pt Fri Oct 26 15:50:48 2007 From: bsantos at biocant.pt (Bruno Santos) Date: Fri, 26 Oct 2007 16:50:48 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <4721FBB9.1040408@maubp.freeserve.co.uk> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> <4721B8E2.2040902@maubp.freeserve.co.uk> <000701c817d5$d0e8f4e0$2300a8c0@bsantos> <4721FBB9.1040408@maubp.freeserve.co.uk> Message-ID: <000801c817e7$fd1bc940$2300a8c0@bsantos> Peter Said: >XML files are big. Lots of query sequences will also make things >bigger. And the default expectation threshold will also give lots of >results - setting this to something harsher will help by giving less >matches. > >Unless you want to keep the XML file for other analysis, it might be >simpler to parse the output from blast directly with Biopython - >avoiding having the large XML file on disk. > >Keeping the XML intermediate file can be a good idea when working on >smaller datasets, where you want to tweak your analysis (without >re-running blast each time). But if even I don't want to save the results to an XML I still have to do the step right? And my problem is in this step not in writing to the file. Or I can use the result_handle directly, because I was reading the biopython documentation but it's not very clear. From biopython at maubp.freeserve.co.uk Fri Oct 26 16:04:40 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Oct 2007 17:04:40 +0100 Subject: [BioPython] Problems with NCBIXML.py In-Reply-To: <000801c817e7$fd1bc940$2300a8c0@bsantos> References: <001101c8158d$7d146600$2300a8c0@bsantos> <471E1CBC.30601@maubp.freeserve.co.uk> <001b01c81598$95f7b3b0$2300a8c0@bsantos> <471E3A13.5080505@maubp.freeserve.co.uk> <001601c8165a$48248600$2300a8c0@bsantos> <320fb6e00710241032t651a5207ub2bf57285caf9cb9@mail.gmail.com> <000301c817b0$8c868c10$2300a8c0@bsantos> <4721B8E2.2040902@maubp.freeserve.co.uk> <000701c817d5$d0e8f4e0$2300a8c0@bsantos> <4721FBB9.1040408@maubp.freeserve.co.uk> <000801c817e7$fd1bc940$2300a8c0@bsantos> Message-ID: <47221018.9090104@maubp.freeserve.co.uk> Bruno Santos wrote: > Peter Said: >> Unless you want to keep the XML file for other analysis, it might be >> simpler to parse the output from blast directly with Biopython - >> avoiding having the large XML file on disk. > > But if even I don't want to save the results to an XML I still have to do > the step right? > And my problem is in this step not in writing to the file. > Or I can use the result_handle directly, because I was reading the biopython > documentation but it's not very clear. The intention is something like this: result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastn",my_blast_db, my_blast_file) blast_records = NCBIXML.parse(result_handle) for record in blast_records : #do stuff The bit about saving the results to a file and loading that to give a new handle is optional, but very handy if you need to look at the raw file by hand. Perhaps that section of the tutorial could be a little clearer ... Peter From mdehoon at c2b2.columbia.edu Sun Oct 28 06:32:40 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Sun, 28 Oct 2007 02:32:40 -0400 Subject: [BioPython] Biopython release 1.44 ready Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B645@mail2.exch.c2b2.columbia.edu> Hi everybody, Biopython release 1.44 is now available for download from the Biopython website at http://biopython.org. This release includes lots of code improvements and fixes in the Blast interface and parsers, sequence input/output, the SwissProt parser, the clustering routines, as well as a brand new module for population genetics. For reasons of compatibility, some radical changes were necessary in some parts of the code; please let us know if you find some functionality missing. My thanks to all code contributers who made this new release possible. --Michiel on behalf of the Biopython developers Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From tiagoantao at gmail.com Sun Oct 28 21:31:58 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Sun, 28 Oct 2007 21:31:58 +0000 Subject: [BioPython] Biopython citation Message-ID: <4724FFCE.20103@gmail.com> Hello, I am submitting a paper regarding a Jython selection detection program that we have done, and I would like to cite biopython. What is really the best, most recent, citation? Tiago -- tiagoantao at gmail.com http://tiago.org/ps From biopython at maubp.freeserve.co.uk Sun Oct 28 20:52:05 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Oct 2007 20:52:05 +0000 Subject: [BioPython] Biopython citation In-Reply-To: <4724FFCE.20103@gmail.com> References: <4724FFCE.20103@gmail.com> Message-ID: <4724F675.8030902@maubp.freeserve.co.uk> Tiago Antao wrote: > I am submitting a paper regarding a Jython selection detection program > that we have done, and I would like to cite biopython. What is really > the best, most recent, citation? > > Tiago For a general project reference, I think the most recent is Brad & Jeff's 2000 newsletter article: Chapman, B. and Chang, J. (2000) Biopython: python tools for computational biology. ACM SIG-BIO Newsletter, 20, 15-19. However, I confess I only cited the www.biopython.org website in my last paper. Peter P.S. There are specific papers for some modules, e.g. Bio.PDB and Bio.Cluster From skhadar at gmail.com Mon Oct 29 13:15:30 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Mon, 29 Oct 2007 18:45:30 +0530 Subject: [BioPython] Biopython citation In-Reply-To: <4724F675.8030902@maubp.freeserve.co.uk> References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk> Message-ID: Hi Peter, I am interested to look at it. We dont have access to ACM. If you have a copy of that paper. Thanks, Shameer On 10/29/07, Peter wrote: > > Tiago Antao wrote: > > I am submitting a paper regarding a Jython selection detection program > > that we have done, and I would like to cite biopython. What is really > > the best, most recent, citation? > > > > Tiago > > For a general project reference, I think the most recent is Brad & > Jeff's 2000 newsletter article: > > Chapman, B. and Chang, J. (2000) Biopython: python tools for > computational biology. ACM SIG-BIO Newsletter, 20, 15-19. > > However, I confess I only cited the www.biopython.org website in my last > paper. > > Peter > > P.S. There are specific papers for some modules, e.g. Bio.PDB and > Bio.Cluster > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From skhadar at gmail.com Mon Oct 29 14:11:41 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Mon, 29 Oct 2007 19:41:41 +0530 Subject: [BioPython] Biopython citation In-Reply-To: <4725E655.8080608@maubp.freeserve.co.uk> References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk> <4725E655.8080608@maubp.freeserve.co.uk> Message-ID: Hi , Thanks for that !!! -- Shameer On 10/29/07, Peter wrote: > > Shameer Khadar wrote: > > Hi Peter, > > > > I am interested to look at it. We dont have access to ACM. If you > > have a copy of that paper. > > > > Thanks, Shameer > > Its not actually very informative, especial as of the examples are now > rather dated. Anyway, I believe the new-letter article was the same as > the document available on our website: > > http://biopython.org/DIST/docs/acm/ACMbiopy.html > http://biopython.org/DIST/docs/acm/ACMbiopy.pdf > > Chapman, B. and Chang, J. (2000) Biopython: python tools for > computational biology. ACM SIG-BIO Newsletter, 20, 15-19. > > Peter > From biopython at maubp.freeserve.co.uk Mon Oct 29 13:55:33 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Oct 2007 13:55:33 +0000 Subject: [BioPython] Biopython citation In-Reply-To: References: <4724FFCE.20103@gmail.com> <4724F675.8030902@maubp.freeserve.co.uk> Message-ID: <4725E655.8080608@maubp.freeserve.co.uk> Shameer Khadar wrote: > Hi Peter, > > I am interested to look at it. We dont have access to ACM. If you > have a copy of that paper. > > Thanks, Shameer Its not actually very informative, especial as of the examples are now rather dated. Anyway, I believe the new-letter article was the same as the document available on our website: http://biopython.org/DIST/docs/acm/ACMbiopy.html http://biopython.org/DIST/docs/acm/ACMbiopy.pdf Chapman, B. and Chang, J. (2000) Biopython: python tools for computational biology. ACM SIG-BIO Newsletter, 20, 15-19. Peter From biopython at maubp.freeserve.co.uk Mon Oct 29 19:22:20 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Oct 2007 19:22:20 +0000 Subject: [BioPython] Loading SwissProt to BioSQL In-Reply-To: <4720D0E6.8000609@maubp.freeserve.co.uk> References: <471FB5DE.6080506@maubp.freeserve.co.uk> <4720D0E6.8000609@maubp.freeserve.co.uk> Message-ID: <320fb6e00710291222l1a5746e9m3bbc5c4c9fd03921@mail.gmail.com> Jonathan Boulais wrote: > But I still get the same error as before when I'm running it. > ... For anyone wanting to track this issue, Jonathan has filled Bug 2390 - Error importing Swiss Prot in BioSQL http://bugzilla.open-bio.org/show_bug.cgi?id=2390 Peter From anaryin at gmail.com Tue Oct 30 01:28:21 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Tue, 30 Oct 2007 01:28:21 +0000 Subject: [BioPython] Fwd: Scripts cannot connect In-Reply-To: References: <471C9C34.7000006@maubp.freeserve.co.uk> <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> Message-ID: I've checked all my connection settings, tested an awful lot of possibilities and I came to this conclusion. When using a webservice, I can't connect to the internet. In the same script, I can get for instance, the google page, but the lines regarding the webservice itself, they won't connect. I've tried to set environment proxy (through export http_proxy='blabla:yyyy') in the script itself and nothing. I've set os.environ[blabla] and it's doesn't work. So, does anyone has an idea of why this is happening? Shouldn't the webservice, if using http protocol (as it does), work just like any other command (let's say, urllib.urlopen)? I know this falls out of the BioPython theme but I consider it quite relevant for my BioPython work :) Thank you all in advance! From biopython at maubp.freeserve.co.uk Tue Oct 30 08:53:14 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Oct 2007 08:53:14 +0000 Subject: [BioPython] Fwd: Scripts cannot connect In-Reply-To: References: <471C9C34.7000006@maubp.freeserve.co.uk> <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> Message-ID: <4726F0FA.6000209@maubp.freeserve.co.uk> Jo?o Rodrigues wrote: > I've checked all my connection settings, tested an awful lot of > possibilities and I came to this conclusion. When using a webservice, I > can't connect to the internet. In the same script, I can get for instance, > the google page, but the lines regarding the webservice itself, they won't > connect. Are you still finding things work on Windows, but fail on Linux? If so, are you running the same version of python (and Biopython) on both? > I've tried to set environment proxy (through export > http_proxy='blabla:yyyy') in the script itself and nothing. I've set > os.environ[blabla] and it's doesn't work. When you say "it doesn't work", do you mean the (a) environment variable isn't set, or (b) the environment variable is set but has not effect. > So, does anyone has an idea of why this is happening? Shouldn't the > webservice, if using http protocol (as it does), work just like any other > command (let's say, urllib.urlopen)? Are you saying there is a difference depending on the URL type (plain page versus web-service?) Or, are you saying there is a difference depending on what python library you use (e.g. urllib or something else). > I know this falls out of the BioPython theme but I consider it quite > relevant for my BioPython work :) > > Thank you all in advance! This must be very frustrating for you. Have you been able to find your University's official documentation for the proxy? Peter From biopython at maubp.freeserve.co.uk Tue Oct 30 12:32:10 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Oct 2007 12:32:10 +0000 Subject: [BioPython] Question about Seq.count() In-Reply-To: <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <5aa3b3570710190638h23665c4cpb8d53a8cb64c7322@mail.gmail.com> <320fb6e00710190750n6b1752bcga0846159e32cf02c@mail.gmail.com> Message-ID: <4727244A.4010705@maubp.freeserve.co.uk> Peter wrote: >> I've found the bug! >> >> The code for Bio.Seq.count is: >> >> def count(self, item): >> return len([x for x in self.data if x == item]) > > Yeah - by design this (and the functionally similar version for the > MutableSeq) both expect the count argument to be a single letter. The > simple fix for the Seq object is to use the string method internally: > > def count(self, item): > return self.data.count(item) > > For the MutableSeq things are not so straight forward, but supporting > multiple character arguments can be done. Bug 2386 and proposed patch here: http://bugzilla.open-bio.org/show_bug.cgi?id=2386 This also lets the count methods take Seq or MutableSeq objects as arguments - in addition to plain strings. Note there is room for improvement in my patch: For the case of the MutableSeq, we might want to investigate counting from the array of characters directly, rather than taking the lazy option of turning it into a string and counting that way. Peter From anaryin at gmail.com Tue Oct 30 16:29:00 2007 From: anaryin at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Rodrigues?=) Date: Tue, 30 Oct 2007 16:29:00 +0000 Subject: [BioPython] Fwd: Scripts cannot connect In-Reply-To: <4726F0FA.6000209@maubp.freeserve.co.uk> References: <471C9C34.7000006@maubp.freeserve.co.uk> <320fb6e00710220658x12866cb6w63f7ff96f5bcd2b0@mail.gmail.com> <4726F0FA.6000209@maubp.freeserve.co.uk> Message-ID: Are you still finding things work on Windows, but fail on Linux? If so, are you running the same version of python (and Biopython) on both? There is the same version in all operative systems. I'm using XP (one 32bits the other 64) in the Windows Machines (one at home another at "work") and Ubuntu 7.10 in both my laptop and the Workstation at the University (it's dual-booted). Regarding Biopython, it's the same version in all but my laptop that has the last upgrade of the 28th October (but still, it never worked before). But since I'm not using any modules, it should not have anything to do with it. When you say "it doesn't work", do you mean the (a) environment variable isn't set, or (b) the environment variable is set but has not effect. An example: I start a new session in my laptop and open the console. I type "export http_proxy='blabla'" to set the variable. I then type "env" and it returns me a list of all env variable *including* the http_proxy one. I run "aptitude update" and it works. If I do the same in a Python Script, it doesn't (at least when connecting to a webservice). I believe then, that the variable is set but it doesn't work somehow. Are you saying there is a difference depending on the URL type (plain page versus web-service?) I *think*, or suppose, that somehow, the two "types" of connection, despite using HTTP and the same proxy env. variable, are working differently. Or, are you saying there is a difference depending on what python library you use (e.g. urllib or something else). Which other libraries can I try out? Other than urllib? This must be very frustrating for you. Have you been able to find your University's official documentation for the proxy? It's a dilemma. On the one hand, I have a perfectly set windows system that can access the internet through the scripts I write. However, there is no ZSI for it (ot at least, I can't install it). As such, no SOAP support, no API I can get to work. On the other hand, GNU/Linux. It works perfectly, the *.deb packages exist and are quite easy to install, so I have ZSI and SOAP support to work with the API. However, I can't access the web with the ZSI module. I'll try to talk to the University Informatics Service to see if they can figure it out. Really hope they can, otherwise, I guess I'll just have to work from home since it works there.. :) Again, very thankful! Jo?o Rodrigues