From ytu888 at hotmail.com Mon Oct 1 07:39:50 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 1 Oct 2007 06:39:50 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: Thanks Peter, However, I still haven't install mxText module in my Mac yet. Also could you tell me how to run the test file of ReportLab, when I launch Python and then import the test file into the python. Thanks. > Date: Fri, 28 Sep 2007 20:42:31 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com > CC: biopython at lists.open-bio.org > Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X > > Y Tu wrote: > > Thank you, Peter for the prompt answer. > > > > I did install the PIL already and tested with the commands "from PIL > > import Image", then "import _imaging". Both commands succeeded. > > That's why I don't understand why the test won't work. I used the > > command "python test_pdfgen_general.py" under the shell prompt, which > > generated the error. Since I installed PIL and succeeded in importing > > the module of PIL, I thought maybe I can solve the problem by running > > the test under Python. > > Looking in more detail at the original stack trace, > > > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/ImageFile.py", line 180, in load > > d = Image._getdecoder(self.mode, d, a, self.decoderconfig) > > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/PIL/Image.py", line 375, in _getdecoder > > raise IOError("decoder %s not available" % decoder_name) > > IOError: decoder jpeg not available > > Its possible that PIL needs some optional JPEG library, which ReportLab > wants to use. I suggest you search the ReportLab website & user's > mailing list, and if you can't work out what is wrong sign up to their > mailing list and ask them, http://www.reportlab.org/ > > Very little of Biopython needs ReportLab, you should be able to install > Biopython without it. > > Peter > > _________________________________________________________________ Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us From ytu888 at hotmail.com Mon Oct 1 13:54:00 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 1 Oct 2007 12:54:00 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <46FD5927.3000207@maubp.freeserve.co.uk> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and installed it. Then I tried to install MySQL-python-1.2.2 but got the following error. How to create the mysql_config.path file? Thank you very much. leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ python setup.py build sh: line 1: mysql_config: command not found Traceback (most recent call last): File "setup.py", line 16, in metadata, options = get_config() File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 43, in get_config libs = mysql_config("libs_r") File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 24, in mysql_config raise EnvironmentError, "%s not found" % mysql_config.path EnvironmentError: mysql_config not found _________________________________________________________________ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx From lists.steve at arachnedesign.net Mon Oct 1 16:18:04 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Mon, 1 Oct 2007 16:18:04 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and > installed it. Then I tried to install MySQL-python-1.2.2 but got > the following error. How to create the mysql_config.path file? > Thank you very much. > > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ > python setup.py build > sh: line 1: mysql_config: command not found It seems as if you need to have the `mysql_config` command in your PATH variable and it's not there. Look for where mysql was installed (maybe /usr/local/mysql/...) and add its bin directory to your PATH environment variable. Or maybe it installed some binaries/symlinks into your /usr/local/bin directory? I think that'll do it for you. -steve From biopython at maubp.freeserve.co.uk Mon Oct 1 17:06:37 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 1 Oct 2007 22:06:37 +0100 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> Message-ID: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com> On 10/1/07, Y Tu wrote: > > Thanks Peter, > > However, I still haven't install mxText module in my Mac yet. I see you've signed up to the eGenix mailing list - I hope they can solve your mxTextTools installation problems. > Also could you tell me how to run the test file of ReportLab, when I > launch Python and then import the test file into the python. Thanks. In general I think most tests are designed to be run from the command line, not by running python, typing an import statement, and typing another command. You should check the ReportLab documentation to see what they recommend. To run a specific Biopython unit test, such as the general graphics unit test, you would do this: python run_tests.py test_GraphicsGeneral.py That would run the test, and check the output matched the expected results. Alternatively, you can do: python test_GraphicsGeneral.py I hope that helps. Peter From ULNJUJERYDIX at spammotel.com Tue Oct 2 02:52:53 2007 From: ULNJUJERYDIX at spammotel.com (Kevin Lam) Date: Tue, 2 Oct 2007 14:52:53 +0800 Subject: [BioPython] Fwd: **Fwd: [Bioperl-l] divide and blast blastunsplit blast subsequence In-Reply-To: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> References: <5b6410e0710012321h4320d804p6c6262860eff2463@mail.gmail.com> Message-ID: <5b6410e0710012352s520b537bj7374dd874dc93104@mail.gmail.com> Hi! I am trying to annotate a 200kb sequence by doing blastx to find the protein seq location I need to split the sequence up so that I get the best hits for each region (the top blast hits will mask the smaller proteins if i do it as a whole sequence) if i were to do it manually i can set the subsequence in the web gui for ncbi's blast. this way, the blast hits coords are based on the whole 200kb. but I can't find this option in blast or a straightforward way to do it in bioperl. I found similar solutions like http://www.bio.davidson.edu/projects/DAB/DAB.html divide and blast (but I want to specify coords rather than fixed intervals) there also this from the bioperl archives http://bioinformatics.org/pipermail/bioclusters/2002-August/000375.html but isn't there an easier way like i can specify blast subsequence 200-900 of fasta file and it will return the blastx hits in coords in terms of the whole 200kb? From mdehoon at c2b2.columbia.edu Tue Oct 2 05:06:54 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 05:06:54 -0400 Subject: [BioPython] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Hi everybody, Since no users of Bio.MultiProc came forward, I deprecated it for the upcoming release. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon Sent: Tue 9/11/2007 10:37 AM To: BioPython Developers List; biopython at biopython.org Subject: [BioPython] Bio.MultiProc Hi everybody, In preparation for the upcoming release, I was running the Biopython test suite and found that test_copen.py hangs on Cygwin. It doesn't fail, it just sits there forever. This may be related to the use of fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it is probably possible to fix this, I'd have to dig fairly deep into the code, and I am not sure if it is worth it. It looks like the copen functions are used only in Bio/config, which is needed for Bio.db. A description of the functionality of thia module can be found in the tutorial section 4.7.2. Now, I don't remember users asking about this module on the mailing list. From the tutorial documentation, it seems to be a nice piece of code, but I doubt that it is being used often in practice. So I was wondering: 1) Is anybody on this list using this code? 2) If not, can I mark it as deprecated for the upcoming release? Hopefully, people who are using this code will notice, and let us know that they need it. --Michiel. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From ytu888 at hotmail.com Tue Oct 2 07:36:58 2007 From: ytu888 at hotmail.com (Y Tu) Date: Tue, 2 Oct 2007 06:36:58 -0500 Subject: [BioPython] Error for running of ReportLab test on Mac OS X In-Reply-To: <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <320fb6e00710011406o3c4d4049q7b5345d18381362e@mail.gmail.com> Message-ID: Thank you very much, Peter. > Date: Mon, 1 Oct 2007 22:06:37 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com > Subject: Re: [BioPython] Error for running of ReportLab test on Mac OS X > CC: biopython at lists.open-bio.org > > On 10/1/07, Y Tu wrote: > > > > Thanks Peter, > > > > However, I still haven't install mxText module in my Mac yet. > > I see you've signed up to the eGenix mailing list - I hope they can > solve your mxTextTools installation problems. > > > Also could you tell me how to run the test file of ReportLab, when I > > launch Python and then import the test file into the python. Thanks. > > In general I think most tests are designed to be run from the command > line, not by running python, typing an import statement, and typing > another command. You should check the ReportLab documentation to see > what they recommend. > > To run a specific Biopython unit test, such as the general graphics > unit test, you would do this: > > python run_tests.py test_GraphicsGeneral.py > > That would run the test, and check the output matched the expected > results. Alternatively, you can do: > > python test_GraphicsGeneral.py > > I hope that helps. > > Peter _________________________________________________________________ Help yourself to FREE treats served up daily at the Messenger Caf?. Stop by today. http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline From ytu888 at hotmail.com Tue Oct 2 08:29:46 2007 From: ytu888 at hotmail.com (Y Tu) Date: Tue, 2 Oct 2007 07:29:46 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> Message-ID: Hi Steve, I checked the PATH and added /usr/local/mysql/bin into it. But I still got the same error message when running the setup.py. Thanks. > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 1 Oct 2007 16:18:04 -0400 > To: ytu888 at hotmail.com > > > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and > > installed it. Then I tried to install MySQL-python-1.2.2 but got > > the following error. How to create the mysql_config.path file? > > Thank you very much. > > > > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ > > python setup.py build > > sh: line 1: mysql_config: command not found > > It seems as if you need to have the `mysql_config` command in your > PATH variable and it's not there. > > Look for where mysql was installed (maybe /usr/local/mysql/...) and > add its bin directory to your PATH environment variable. Or maybe it > installed some binaries/symlinks into your /usr/local/bin directory? > > I think that'll do it for you. > > -steve > _________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us From idoerg at gmail.com Tue Oct 2 12:00:41 2007 From: idoerg at gmail.com (Iddo Friedberg) Date: Tue, 2 Oct 2007 09:00:41 -0700 Subject: [BioPython] [Biopython-dev] Bio.MultiProc In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> References: <46E6A845.3030601@c2b2.columbia.edu> <6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From mdehoon at c2b2.columbia.edu Tue Oct 2 20:18:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 2 Oct 2007 20:18:59 -0400 Subject: [BioPython] [Biopython-dev] Bio.MultiProc References: <46E6A845.3030601@c2b2.columbia.edu><6243BAA9F5E0D24DA41B27997D1FD14402B62B@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62D@mail2.exch.c2b2.columbia.edu> > Would it be possible to include the module, comment out the unworkable > source code and print a deprecation warning when it is imported? That is what I did. > 3) Leave an option of fixing and commenting the code back in (i.e. it is not > lost forever). Even after removing the code in some future release, the code will not be lost forever. It can always be retrieved from CVS and from older Biopython releases. > Also, is it possible to track down the original author? That would be Jeff Chang. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Iddo Friedberg [mailto:idoerg at gmail.com] Sent: Tue 10/2/2007 12:00 PM To: Michiel De Hoon Cc: BioPython Developers List; biopython at biopython.org Subject: Re: [Biopython-dev] [BioPython] Bio.MultiProc Would it be possible to include the module, comment out the unworkable source code and print a deprecation warning when it is imported? That was we: 1) Don't have a clunky module BUT 2) we warn anyone who uses it (but didn't happen to read your post) that it is deprecated when they install a new biopython version AND 3) Leave an option of fixing and commenting the code back in (i.e. it is not lost forever). Also, is it possible to track down the original author? ./I On 10/2/07, Michiel De Hoon wrote: > > Hi everybody, > > Since no users of Bio.MultiProc came forward, I deprecated it for the > upcoming release. > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Michiel De Hoon > Sent: Tue 9/11/2007 10:37 AM > To: BioPython Developers List; biopython at biopython.org > Subject: [BioPython] Bio.MultiProc > > Hi everybody, > > In preparation for the upcoming release, I was running the Biopython > test suite and found that test_copen.py hangs on Cygwin. It doesn't > fail, it just sits there forever. This may be related to the use of > fork() instead of select() in Bio/MultiProc/copen.py. Anyway, while it > is probably possible to fix this, I'd have to dig fairly deep into the > code, and I am not sure if it is worth it. It looks like the copen > functions are used only in Bio/config, which is needed for Bio.db. A > description of the functionality of thia module can be found in the > tutorial section 4.7.2. > > Now, I don't remember users asking about this module on the mailing > list. From the tutorial documentation, it seems to be a nice piece of > code, but I doubt that it is being used often in practice. > > So I was wondering: > 1) Is anybody on this list using this code? > 2) If not, can I mark it as deprecated for the upcoming release? > Hopefully, people who are using this code will notice, and let us know > that they need it. > > --Michiel. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- I. Friedberg "The only problem with troubleshooting is that sometimes trouble shoots back." From ytu888 at hotmail.com Wed Oct 3 08:44:32 2007 From: ytu888 at hotmail.com (Y Tu) Date: Wed, 3 Oct 2007 07:44:32 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> Message-ID: Here is the copy of the output in the Terminal. Please help me to find out what's wrong. Thanks. Last login: Wed Oct 3 08:28:38 on ttyp4 Welcome to Darwin! LeesComputer:~ Lee$ echo $PATH /Library/Frameworks/Python.framework/Versions/Current/bin:/usr/local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin LeesComputer:~ Lee$ cd /applications/python_bio/MySQL-python-1.2.2 LeesComputer:/applications/python_bio/MySQL-python-1.2.2 Lee$ python setup.py build sh: line 1: mysql_config: command not found Traceback (most recent call last): File "setup.py", line 16, in metadata, options = get_config() File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 43, in get_config libs = mysql_config("libs_r") File "/Applications/Python_Bio/MySQL-python-1.2.2/setup_posix.py", line 24, in mysql_config raise EnvironmentError, "%s not found" % mysql_config.path EnvironmentError: mysql_config not found LeesComputer:/applications/python_bio/MySQL-python-1.2.2 Lee$ cd /usr/local LeesComputer:/usr/local Lee$ ls -al total 8 drwxr-xr-x 8 root wheel 272 Oct 1 13:02 . drwxr-xr-x 10 root wheel 340 Sep 26 11:30 .. drwxr-xr-x 8 root admin 272 Aug 6 04:00 ActivePerl-5.8 drwxr-xr-x 15 root wheel 510 Oct 2 03:52 bin drwxr-xr-x 6 root wheel 204 Sep 27 05:22 include drwxr-xr-x 12 root wheel 408 Sep 27 05:21 lib lrwxr-xr-x 1 root wheel 25 Oct 1 13:02 mysql -> mysql-5.0.45-osx10.4-i686 drwxr-xr-x 19 root wheel 646 Jul 4 13:54 mysql-5.0.45-osx10.4-i686 > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 1 Oct 2007 16:18:04 -0400 > To: ytu888 at hotmail.com > > > I downloaded mysql-5.0.45-osx10.4-i686.dmg from mysql web and > > installed it. Then I tried to install MySQL-python-1.2.2 but got > > the following error. How to create the mysql_config.path file? > > Thank you very much. > > > > leesComputer:/applications/Python_Bio/MySQL-python-1.2.2 lee$ > > python setup.py build > > sh: line 1: mysql_config: command not found > > It seems as if you need to have the `mysql_config` command in your > PATH variable and it's not there. > > Look for where mysql was installed (maybe /usr/local/mysql/...) and > add its bin directory to your PATH environment variable. Or maybe it > installed some binaries/symlinks into your /usr/local/bin directory? > > I think that'll do it for you. > > -steve > _________________________________________________________________ Climb to the top of the charts!? Play Star Shuffle:? the word scramble challenge with star power. http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_oct From lists.steve at arachnedesign.net Wed Oct 3 09:01:09 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Wed, 3 Oct 2007 09:01:09 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> Message-ID: <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> Hi, On Oct 3, 2007, at 8:44 AM, Y Tu wrote: > Here is the copy of the output in the Terminal. Please help me to > find out what's wrong. Thanks. > > Last login: Wed Oct 3 08:28:38 on ttyp4 > Welcome to Darwin! > LeesComputer:~ Lee$ echo $PATH > /Library/Frameworks/Python.framework/Versions/Current/bin:/usr/ > local/bin:.:/usr/local/mysql:/bin:/sbin:/usr/bin:/usr/sbin It still looks like your PATH is screwed up, /usr/local/mysql/bin isn't in there, you have: /usr/local/mysl:/bin Here's a test. Open up a terminal and type: $ which mysql_config If you don't get an answer back that indicates that the system can find the binary, then your script won't either. For instance, this is how it looks for me: $ which mysql_config /Library/MySQL/bin/mysql_config (I have an older version of mysql which was installed into /Library/ MySQL) Yours should say: $ which mysql_config /usr/local/mysql/bin/mysql_config Or something like that. Try that and see ... -steve From lists.steve at arachnedesign.net Wed Oct 3 10:47:41 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Wed, 3 Oct 2007 10:47:41 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> Message-ID: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> > Steve, thank you very much. It fixed the problem and I got through > the build and install step. But when I tested inside the python for > the installation I got following error. Please help me about it. > Thanks. > > >>> import MySQLdb > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ > site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > _mysql.py:3: UserWarning: Module _mysql was already imported from / > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, > but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to > sys.path > import sys, pkg_resources, imp > Traceback (most recent call last): > File "", line 1, in > File "MySQLdb/__init__.py", line 19, in > import _mysql > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in > > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in > __bootstrap__ > ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / > usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib > Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so > Reason: image not found Sorry, don't know exactly what's happening here. Is this from a "fresh" python prompt? How did you install MySQLdb, did you use easy_install? If so, try to install from the sourceforge download. Try to remove it, remove the "build" directory from your mysqldb download and redo the whole python setup.py build / python setup.py install process To remove it, nuke this: /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg And try to reinstall? Perhaps someone who knows what the problem is here can give you a better idea on what to do. -steve From sbassi at gmail.com Thu Oct 4 02:47:44 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 4 Oct 2007 03:47:44 -0300 Subject: [BioPython] Problem with blast xml Message-ID: I am having a problem that it is not originated in Biopython, but it is affecting the Biopython (1.43) xml blast parser. I have two xml files, one can be parsed and the other can't. Here are the commands I run to get the xml files: sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o TABB2v2.xml The relevant difference is the input file, the sequences are different, but the output file should have the same format (shouldn't it?). When I am parsing the files, I find that this is not true. This is the file that can be parsed without problem: >>> bout=open('bioinfo/INTA/TABB2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 31' >>> y.query u'fragment 67' >>> x.alignments [] >>> y.alignments [, , , , , , ] Let's see what seems to be a malformed? xml file: >>> bout=open('bioinfo/INTA/TABB2v2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 1' >>> y.query u'fragment 57' >>> x.alignments [] >>> y.alignments [] There is a record with an empty list. Here is a fragment of the "normal" one (TABB2.xml): 2 F 31 lcl|31_0 fragment 31 1174 1 gi|1788520|gb|AE000309.1|AE000309 Escherichia coli K-12 MG1655 section 199 of 400 of the complete genome AE000309 13453 1 Here is a fragment of the "malformed" one (TABB2v2.xml): 2 F 1 400 4662239 0 0 0.710603 1.37406 1.30725 57 Why is this happening? Is this a expected behavior? I uploaded the xml files here: http://www.bioinformatica.info/TABB2.xml http://www.bioinformatica.info/TABB2v2.xml -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From ytu888 at hotmail.com Thu Oct 4 08:24:18 2007 From: ytu888 at hotmail.com (Y Tu) Date: Thu, 4 Oct 2007 07:24:18 -0500 Subject: [BioPython] Error generated by Clustalw example in Tutorial Message-ID: Hi, I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks. >>> from Bio import Clustalw >>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta")) >>> cline.set_output("result.aln") >>> print cline clustalw .\opuntia.fasta -OUTFILE=result.aln >>> alignment = Clustalw.do_alignment(cline) Traceback (most recent call last): File "", line 1, in File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment % (out_file, command_line))IOError: Output .aln file result.aln not produced, commandline: clustalw .\opuntia.fasta -OUTFILE=result.aln _________________________________________________________________ Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033 From sbassi at gmail.com Thu Oct 4 12:19:22 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Thu, 4 Oct 2007 13:19:22 -0300 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: References: Message-ID: On 10/4/07, Y Tu wrote: > >>> print cline > clustalw .\opuntia.fasta -OUTFILE=result.aln I am not sure if this command is properly formated. The slash should not be there, but I don't have a windows box to try this. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Thu Oct 4 21:01:59 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 4 Oct 2007 21:01:59 -0400 Subject: [BioPython] Problem with blast xml References: Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> Can you create two minimal XML files that demonstrate the problem? For example, by removing records from the two files you have and checking if parsing still works for one and fails for the other. By doing so, you may be able to identify exactly what the essential difference between the two files is. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Sebastian Bassi Sent: Thu 10/4/2007 2:47 AM To: biopython at biopython.org Subject: [BioPython] Problem with blast xml I am having a problem that it is not originated in Biopython, but it is affecting the Biopython (1.43) xml blast parser. I have two xml files, one can be parsed and the other can't. Here are the commands I run to get the xml files: sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TAB.fasta -e 0.0001 -m 7 -o TABB2.xml sbassi at xubuntu:~/blast-2.2.16/bin$ ./blastall -p blastn -d /media/vic300/BLASTdb/ecoli.nt -i /media/vic300/INTA/mitofragsB2-TABv2.fasta -e 0.0001 -m 7 -o TABB2v2.xml The relevant difference is the input file, the sequences are different, but the output file should have the same format (shouldn't it?). When I am parsing the files, I find that this is not true. This is the file that can be parsed without problem: >>> bout=open('bioinfo/INTA/TABB2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 31' >>> y.query u'fragment 67' >>> x.alignments [] >>> y.alignments [, , , , , , ] Let's see what seems to be a malformed? xml file: >>> bout=open('bioinfo/INTA/TABB2v2.xml') >>> b_records=NCBIXML.parse(bout) >>> x=b_records.next() >>> y=b_records.next() >>> x.query u'fragment 1' >>> y.query u'fragment 57' >>> x.alignments [] >>> y.alignments [] There is a record with an empty list. Here is a fragment of the "normal" one (TABB2.xml): 2 F 31 lcl|31_0 fragment 31 1174 1 gi|1788520|gb|AE000309.1|AE000309 Escherichia coli K-12 MG1655 section 199 of 400 of the complete genome AE000309 13453 1 Here is a fragment of the "malformed" one (TABB2v2.xml): 2 F 1 400 4662239 0 0 0.710603 1.37406 1.30725 57 Why is this happening? Is this a expected behavior? I uploaded the xml files here: http://www.bioinformatica.info/TABB2.xml http://www.bioinformatica.info/TABB2v2.xml -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From sbassi at gmail.com Fri Oct 5 01:39:44 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Fri, 5 Oct 2007 02:39:44 -0300 Subject: [BioPython] Problem with blast xml In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> Message-ID: On 10/4/07, Michiel De Hoon wrote: > Can you create two minimal XML files that demonstrate the problem? > For example, by removing records from the two files you have and checking if > parsing still works for one and fails for the other. > By doing so, you may be able to identify exactly what the essential > difference between the two files is. After some tests, I found two minimal XML files with this issue: http://www.bioinformatica.info/mitoA.xml http://www.bioinformatica.info/mitoB.xml (only 3.5 kb each). -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Fri Oct 5 02:34:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri, 5 Oct 2007 02:34:56 -0400 Subject: [BioPython] Problem with blast xml References: <6243BAA9F5E0D24DA41B27997D1FD14402B62F@mail2.exch.c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B631@mail2.exch.c2b2.columbia.edu> >From looking at the XML files, it seems that the Biopython Blast XML parser is doing the right thing. Isn't it? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: Sebastian Bassi [mailto:sbassi at gmail.com] Sent: Fri 10/5/2007 1:39 AM To: Michiel De Hoon Cc: biopython at biopython.org Subject: Re: [BioPython] Problem with blast xml On 10/4/07, Michiel De Hoon wrote: > Can you create two minimal XML files that demonstrate the problem? > For example, by removing records from the two files you have and checking if > parsing still works for one and fails for the other. > By doing so, you may be able to identify exactly what the essential > difference between the two files is. After some tests, I found two minimal XML files with this issue: http://www.bioinformatica.info/mitoA.xml http://www.bioinformatica.info/mitoB.xml (only 3.5 kb each). -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Fri Oct 5 05:26:06 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 05 Oct 2007 10:26:06 +0100 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: References: Message-ID: <4706032E.1020703@maubp.freeserve.co.uk> Y Tu wrote: > Hi, > > I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks. > >>>> from Bio import Clustalw >>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta")) >>>> cline.set_output("result.aln") >>>> print cline > clustalw .\opuntia.fasta -OUTFILE=result.aln The Windows version of ClustalW is very fussy. To experiment try running this by hand at the windows command prompt - note that I'm not at my Windows machine so I haven't double checked this: clustalw .\opuntia.fasta -OUTFILE=result.aln or, clustalw opuntia.fasta -OUTFILE=result.aln Any error messages would be helpful. I suggest you try this in Biopython: from Bio import Clustalw cline = Clustalw.MultipleAlignCL("opuntia.fasta") cline.set_output("result.aln") print cline Also, we have made a few tweaks to this code since Biopython 1.43 was released (see emails with Emanuel Hey in July 2007). If you like, you can try updating this module to the CVS version. Simply backup the existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and replace it with the latest code from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python Peter From ytu888 at hotmail.com Fri Oct 5 12:32:05 2007 From: ytu888 at hotmail.com (Y Tu) Date: Fri, 5 Oct 2007 11:32:05 -0500 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: <4706032E.1020703@maubp.freeserve.co.uk> References: <4706032E.1020703@maubp.freeserve.co.uk> Message-ID: I tested both commands under window prompt, initially both generated error because window don't know clustalw. Once I give the correct path of the clustalw, both generated alignment results without any error. BTW, I used the one inside BioEdit, I did not find clustalw coming with Biopython. It looks like python use online program at ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right? Then I replace the old _ini_with the new one, but there is a new error message similar to the old one: >>> alignment = Clustalw.do_alignment(cline) Traceback (most recent call last): File "", line 1, in File "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, in do_alignment # check if the outfile exists before parsing IOError: Output .aln file result1.aln not produced, commandline: clustalw opuntia.fasta -OUTFILE=result1.aln Also I tested the example on OS X, the same error was generated: >>> alignment = Clustalw.do_alignment(cline) sh: line 1: clustalw: command not found Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", line 117, in do_alignment % (out_file, command_line)) IOError: Output .aln file result1.aln not produced, commandline: clustalw ./opuntia.fasta -OUTFILE=result1.aln It seems like the problem is not linked to OS. What other things could be wrong? Thanks. > Date: Fri, 5 Oct 2007 10:26:06 +0100 > From: biopython at maubp.freeserve.co.uk > To: ytu888 at hotmail.com > CC: biopython at lists.open-bio.org > Subject: Re: [BioPython] Error generated by Clustalw example in Tutorial > > Y Tu wrote: > > Hi, > > > > I'm reading the Biopython tutorial and running the example of clustalw. But it generate the following error. What's wrong? Thanks. > > > >>>> from Bio import Clustalw > >>>> cline = Clustalw.MultipleAlignCL(os.path.join(os.curdir, "opuntia.fasta")) > >>>> cline.set_output("result.aln") > >>>> print cline > > clustalw .\opuntia.fasta -OUTFILE=result.aln > > The Windows version of ClustalW is very fussy. To experiment try > running this by hand at the windows command prompt - note that I'm not > at my Windows machine so I haven't double checked this: > > clustalw .\opuntia.fasta -OUTFILE=result.aln > > or, > > clustalw opuntia.fasta -OUTFILE=result.aln > > Any error messages would be helpful. > > I suggest you try this in Biopython: > > from Bio import Clustalw > cline = Clustalw.MultipleAlignCL("opuntia.fasta") > cline.set_output("result.aln") > print cline > > Also, we have made a few tweaks to this code since Biopython 1.43 was > released (see emails with Emanuel Hey in July 2007). If you like, you > can try updating this module to the CVS version. Simply backup the > existing C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py and > replace it with the latest code from here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Clustalw/__init__.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python > > Peter > _________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us From biopython at maubp.freeserve.co.uk Fri Oct 5 14:35:05 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 05 Oct 2007 19:35:05 +0100 Subject: [BioPython] Error generated by Clustalw example in Tutorial In-Reply-To: References: <4706032E.1020703@maubp.freeserve.co.uk> Message-ID: <470683D9.90808@maubp.freeserve.co.uk> Y Tu wrote: > I tested both commands under window prompt, initially both generated > error because window don't know clustalw. This is expected. You must either supply the full path of the clustalw executable, or have it on the system path. Otherwise Windows doesn't know how to find the clustalw program. > Once I give the correct path of the clustalw, both generated > alignment results without any error. BTW, I used the one inside > BioEdit, I did not find clustalw coming with Biopython. It looks like > python use online program at > ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/. Am I right? Clustalw is a standalone program (completely separate from Biopython) which you must install separately if you want to use it. It is available from several servers - the one you chose looks fine. > Then I replace the old _ini_with the new one, but there is a new > error message similar to the old one: > >>>> alignment = Clustalw.do_alignment(cline) > Traceback (most recent call last): File "", line > 1, in File > "C:\Python25\Lib\site-packages\Bio\Clustalw\__init__.py", line 117, > in do_alignment # check if the outfile exists before parsing IOError: > Output .aln file result1.aln not produced, commandline: clustalw > opuntia.fasta -OUTFILE=result1.aln > > Also I tested the example on OS X, the same error was generated: > >>>> alignment = Clustalw.do_alignment(cline) > sh: line 1: clustalw: command not found Traceback (most recent call > last): File "", line 1, in File > "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/Clustalw/__init__.py", > line 117, in do_alignment % (out_file, command_line)) IOError: > Output .aln file result1.aln not produced, commandline: clustalw > ./opuntia.fasta -OUTFILE=result1.aln > > It seems like the problem is not linked to OS. What other things > could be wrong? Thanks. In both cases, you are not explicitly providing the path to clustalw - so for this to work the clustalw executable must be on the system path. The other obvious thing to check is the location of the files versus the working directory. Is your python script in the same folder as the opuntia.fasta file? What happens if you try those exact command lines (which Biopython says it is trying to run) at the command prompt in directory where your python script is located? i.e. Windows: clustalw opuntia.fasta -OUTFILE=result1.aln Mac: clustalw ./opuntia.fasta -OUTFILE=result1.aln Peter From meesters at uni-mainz.de Mon Oct 8 11:07:54 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 8 Oct 2007 17:07:54 +0200 Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures? Message-ID: <1191856074.5425.24.camel@cmeesters> Hi, I'm trying to 'split' a structure in several pieces, e.g. a former chain 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on. Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ... Perhaps some code explains better what I'm trying to achieve: breakpoints = [1254, 5444, 6690, 10888, 10889, 16332, 16333, 21776, 21776, 27220, 27221, 32665] def split_chain(structure, breakpoints, outname = 'split.pdb'): chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] chain = chains.pop(0) for atom in structure.get_atoms(): number = atom.get_serial_number() if breaks and number == breaks[0]: breaks.pop(0) chain = chains.pop(0) atom.parent.parent.id = chain # assign new chain iostream = PDBIO() try: outfile = open(outname, 'w') iostream.set_structure(structure.structure) iostream.save(outfile) except IOError, msg: raise IOError(msg) So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to 5444. Instead the written pdb-file contains all atoms, but with the wrong chain ids (see above). (Please don't tell my how unpythonic the code reads, point is that I've tried so many different things that I first need to understand my logic mistake.) Any ideas, where my mistake is? Thanks, Christian From meesters at uni-mainz.de Mon Oct 8 11:54:32 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 8 Oct 2007 17:54:32 +0200 Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures? In-Reply-To: <470A508C.4060803@maubp.freeserve.co.uk> References: <1191856074.5425.24.camel@cmeesters> <470A508C.4060803@maubp.freeserve.co.uk> Message-ID: <1191858872.5425.32.camel@cmeesters> > > breakpoints = [1254, 5444, > > 6690, 10888, > > 10889, 16332, > > 16333, 21776, > > 21776, 27220, > > 27221, 32665] > > I'm assuming this is "breaks" later on. Absolutely - that's the pain with copy & paste for demos ... sorry. > As the reason, I think this is what is happening: Given an atom, then > atom.parent will be a residue object, and atom.parent.parent will be a > chain object. Note all the atoms in a single amino acid residue will > share share the same .parent, and all the atoms in a single chain will > share the same .parent.parent > > i.e. You have renamed Chain "A" to "A", and then later renamed this > chain to "B", and then again to "C". You didn't ever split up the chain > into sub chains. Mh, makes sense. > > To be honest, I would be tempted to write a quick and dirty script which > parsed the raw PDB file, and rewrote the chain field based on the atom > sequence number - without the overhead of the PDB parser. Yes, would have been too easy ;-). Only wanted to add this functionality to a larger application and make it easy to use. There is no strict need to do so, but it would have been nice. However, thanks for the input. Christian From biopython at maubp.freeserve.co.uk Mon Oct 8 11:45:16 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 08 Oct 2007 16:45:16 +0100 Subject: [BioPython] Reassigning parent ids in Bio.PDB-structures? In-Reply-To: <1191856074.5425.24.camel@cmeesters> References: <1191856074.5425.24.camel@cmeesters> Message-ID: <470A508C.4060803@maubp.freeserve.co.uk> Christian Meesters wrote: > Hi, > > I'm trying to 'split' a structure in several pieces, e.g. a former chain > 'A' should be splitted in 'A' and 'B', 'B' in 'C' and 'D' and so on. > Now, whatever I do I only get chains 'C', 'F', 'H', 'I', 'K', 'L' ... > > Perhaps some code explains better what I'm trying to achieve: > > breakpoints = [1254, 5444, > 6690, 10888, > 10889, 16332, > 16333, 21776, > 21776, 27220, > 27221, 32665] I'm assuming this is "breaks" later on. > def split_chain(structure, breakpoints, outname = 'split.pdb'): > chains = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', > 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', > 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', > 'X', 'Y', 'Z'] > > chain = chains.pop(0) > for atom in structure.get_atoms(): > number = atom.get_serial_number() > if breaks and number == breaks[0]: > breaks.pop(0) > chain = chains.pop(0) > atom.parent.parent.id = chain # assign new chain > > iostream = PDBIO() > try: > outfile = open(outname, 'w') > iostream.set_structure(structure.structure) > iostream.save(outfile) > except IOError, msg: > raise IOError(msg) > > So, chain 'A' should stay 'A' from atom 1 to 1254 and 'B' from 1254 to > 5444. Instead the written pdb-file contains all atoms, but with the > wrong chain ids (see above). (Please don't tell my how unpythonic the > code reads, point is that I've tried so many different things that I > first need to understand my logic mistake.) > > Any ideas, where my mistake is? As the reason, I think this is what is happening: Given an atom, then atom.parent will be a residue object, and atom.parent.parent will be a chain object. Note all the atoms in a single amino acid residue will share share the same .parent, and all the atoms in a single chain will share the same .parent.parent i.e. You have renamed Chain "A" to "A", and then later renamed this chain to "B", and then again to "C". You didn't ever split up the chain into sub chains. I think you need to create a new chain objects instead... but I'm not sure off hand how best to do this with Bio.PDB To be honest, I would be tempted to write a quick and dirty script which parsed the raw PDB file, and rewrote the chain field based on the atom sequence number - without the overhead of the PDB parser. Peter From bbrazelton at gmail.com Mon Oct 8 20:33:03 2007 From: bbrazelton at gmail.com (B. Brazelton) Date: Mon, 8 Oct 2007 17:33:03 -0700 Subject: [BioPython] BLAST XML parser trouble Message-ID: I tried to follow the BLAST XML parser example in the tutorial, but I always get the following error when attempting to iterate through the records: Traceback (most recent call last): File "BlastXML_Parser.py", line 10, in ? for blast_record in blast_records: File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py", line 572, in parse expat_parser.Parse(text, False) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py", line 98, in endElement eval("self.%s()" % method) File "", line 0, in ? File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/Bio/Blast/NCBIXML.py", line 215, in _end_BlastOutput_version self._header.version = self._value.split()[1] IndexError: list index out of range All I did was: result_handle = open('NifH_Blast.xml') from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: ... etc I put my script and xml file here: http://www.staff.washington.edu/braz/files I'm using biopython 1.43, and I get the same error on both Python 2.3.5 and Python 5. It seems like my commands are exactly what is in the tutorial, so I'm confused. My best guess is that there is a difference in the XML format, but it's NCBI XML. Thanks for any help, Bill Brazelton From sbassi at gmail.com Mon Oct 8 20:48:50 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 8 Oct 2007 21:48:50 -0300 Subject: [BioPython] BLAST XML parser trouble In-Reply-To: References: Message-ID: On 10/8/07, B. Brazelton wrote: > I tried to follow the BLAST XML parser example in the tutorial, but I > always get the following error when attempting to iterate through the > records: Got the same result as you. Could you please tell me the URL of the tutorial you saw this? -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From mdehoon at c2b2.columbia.edu Mon Oct 8 22:55:21 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon, 8 Oct 2007 22:55:21 -0400 Subject: [BioPython] BLAST XML parser trouble References: Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu> How did you produce the XML file? In particular, which Blast version did you use? The Blast XML parser trips over the following line in your XML file: unspecified This is supposed to be: BLASTP 2.2.12 [Aug-07-2005] , of course depending on which Blast version you are using. --Michiel Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton Sent: Mon 10/8/2007 8:33 PM To: biopython at biopython.org Subject: [BioPython] BLAST XML parser trouble I tried to follow the BLAST XML parser example in the tutorial, but I always get the following error when attempting to iterate through the records: Traceback (most recent call last): File "BlastXML_Parser.py", line 10, in ? for blast_record in blast_records: File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/Bio/Blast/NCBIXML.py", line 572, in parse expat_parser.Parse(text, False) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/Bio/Blast/NCBIXML.py", line 98, in endElement eval("self.%s()" % method) File "", line 0, in ? File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- packages/Bio/Blast/NCBIXML.py", line 215, in _end_BlastOutput_version self._header.version = self._value.split()[1] IndexError: list index out of range All I did was: result_handle = open('NifH_Blast.xml') from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: ... etc I put my script and xml file here: http://www.staff.washington.edu/braz/files I'm using biopython 1.43, and I get the same error on both Python 2.3.5 and Python 5. It seems like my commands are exactly what is in the tutorial, so I'm confused. My best guess is that there is a difference in the XML format, but it's NCBI XML. Thanks for any help, Bill Brazelton _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From kbaa at novonordisk.com Tue Oct 9 08:26:14 2007 From: kbaa at novonordisk.com (KBAA (Kent Bondensgaard)) Date: Tue, 9 Oct 2007 14:26:14 +0200 Subject: [BioPython] FW: Parsing sequence information in patents Message-ID: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net> Does anyone know how to parse protein sequence information in patents with Biopython? BR, Kent Bondensgaards __________________________________ Kent Bondensgaard Research Scientist Protein Structure and Biophysics Novo Nordisk A/S Novo Nordisk Park DK-2760 M?l?v Denmark +45 4443 4510 (direct) +45 3075 4510 (mobile) +45 4466 3450 (fax) kbaa at novonordisk.com Changing the way we look at diabetes A new DAWN for people with diabetes? Click here to read more This e-mail (including any attachments) is intended for the addressee(s) stated above only and may contain confidential information protected by law. You are hereby notified that any unauthorized reading, disclosure, copying or distribution of this e-mail or use of information contained herein is strictly prohibited and may violate rights to proprietary information. If you are not an intended recipient, please return this e-mail to the sender and delete it immediately hereafter. Thank you. From sbassi at gmail.com Tue Oct 9 09:04:51 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 9 Oct 2007 10:04:51 -0300 Subject: [BioPython] FW: Parsing sequence information in patents In-Reply-To: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net> References: <48A8D64F1030744983C6747C790164BD05E322FC@EXDKBA023.corp.novocorp.net> Message-ID: On 10/9/07, KBAA (Kent Bondensgaard) wrote: > > Does anyone know how to parse protein sequence information in patents with Biopython? What about using patAA and patNT from NCBI? They are both available as blast ready, you could retrieve the fasta file using fastacmd. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From bbrazelton at gmail.com Tue Oct 9 16:24:58 2007 From: bbrazelton at gmail.com (B. Brazelton) Date: Tue, 9 Oct 2007 13:24:58 -0700 Subject: [BioPython] BLAST XML parser trouble In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B633@mail2.exch.c2b2.columbia.edu> Message-ID: I put in 'tblastx 2.2.15 [Oct-15-2006]' and it worked fine. Thanks for your help, sorry for the newbie question. (FYI, I was using results generated from the CAMERA database (http://camera.calit2.net/), and I was using the main biopython tutorial and cookbook from biopython.org. thanks again, BB On 10/8/07, Michiel De Hoon wrote: > How did you produce the XML file? In particular, which Blast version did you > use? > The Blast XML parser trips over the following line in your XML file: > > unspecified > > This is supposed to be: > > BLASTP 2.2.12 [Aug-07-2005] > > , of course depending on which Blast version you are using. > > --Michiel > > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of B. Brazelton > Sent: Mon 10/8/2007 8:33 PM > To: biopython at biopython.org > Subject: [BioPython] BLAST XML parser trouble > > I tried to follow the BLAST XML parser example in the tutorial, but I > always get the following error when attempting to iterate through the > records: > > Traceback (most recent call last): > File "BlastXML_Parser.py", line 10, in ? > for blast_record in blast_records: > File > "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- > packages/Bio/Blast/NCBIXML.py", > line 572, in parse > expat_parser.Parse(text, False) > File > "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- > packages/Bio/Blast/NCBIXML.py", > line 98, in endElement > eval("self.%s()" % method) > File "", line 0, in ? > File > "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site- > packages/Bio/Blast/NCBIXML.py", > line 215, in _end_BlastOutput_version > self._header.version = self._value.split()[1] > IndexError: list index out of range > > All I did was: > > result_handle = open('NifH_Blast.xml') > from Bio.Blast import NCBIXML > blast_records = NCBIXML.parse(result_handle) > for blast_record in blast_records: > ... etc > > I put my script and xml file here: > http://www.staff.washington.edu/braz/files > > I'm using biopython 1.43, and I get the same error on both Python > 2.3.5 and Python 5. > > It seems like my commands are exactly what is in the tutorial, so I'm > confused. My best guess is that there is a difference in the XML > format, but it's NCBI XML. Thanks for any help, > > Bill Brazelton > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From sbassi at gmail.com Tue Oct 9 17:09:09 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 9 Oct 2007 18:09:09 -0300 Subject: [BioPython] Getting Qv using Python? Message-ID: Is there an automated way to get Quality Values (QV) from a ab1 file? I wrap Abiview [1] to get the sequence, but now I need the Qv. [1] http://bioweb.pasteur.fr/docs/EMBOSS/abiview.html -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From prashanth at ibioinformatics.org Wed Oct 10 08:17:26 2007 From: prashanth at ibioinformatics.org (Prashantha Hebbar Kiradi) Date: Wed, 10 Oct 2007 17:47:26 +0530 Subject: [BioPython] where is SeqIO.parse()? Message-ID: <470CC2D6.1090504@ibioinformatics.org> Hi everybody, While trying the example of 'Parsing sequence file formats' from section 2.4 of Biopython tutorial: ------------------------------------------------- from Bio import SeqIO handle = open("ls_orchid.fasta") for seq_record in SeqIO.parse(handle, "fasta") : print seq_record.id print seq_record.seq print len(seq_record.seq) handle.close() ------------------------------------------------- I get this error: ------------------------------------------------- Traceback (most recent call last): File "fastEx.py", line 5, in for seq_record in SeqIO.parse(handle, "fasta") : AttributeError: 'module' object has no attribute 'parse' ------------------------------------------------- Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm using is opening correctly. API documentation reports that the 'parse' function is there. What am I doing wrong? I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1. Thanks in advance, Prashantha Hebbar Institute of Bioinformatics ITPL, Bangalore, INDIA From fennan at gmail.com Wed Oct 10 08:20:56 2007 From: fennan at gmail.com (Fernando) Date: Wed, 10 Oct 2007 14:20:56 +0200 Subject: [BioPython] Code publications Message-ID: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> Hi everybody, This might be off-topic, or maybe not: I've been working with biopython for a while and I am curious about what the authors get from all the exceptional work they are doing... I know it won't have to do anything with money, but in terms of publication / copyrihts etc, what are the adventages of having your code in biopython? Is there a journey / conference where the author publish their works and likewise they can be referenced or something like that? Thanks, Fernando From mdehoon at c2b2.columbia.edu Wed Oct 10 08:24:33 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 10 Oct 2007 08:24:33 -0400 Subject: [BioPython] where is SeqIO.parse()? References: <470CC2D6.1090504@ibioinformatics.org> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B635@mail2.exch.c2b2.columbia.edu> > I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1. Use Biopython 1.43. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Prashantha Hebbar Kiradi Sent: Wed 10/10/2007 8:17 AM To: biopython at biopython.org Subject: [BioPython] where is SeqIO.parse()? Hi everybody, While trying the example of 'Parsing sequence file formats' from section 2.4 of Biopython tutorial: ------------------------------------------------- from Bio import SeqIO handle = open("ls_orchid.fasta") for seq_record in SeqIO.parse(handle, "fasta") : print seq_record.id print seq_record.seq print len(seq_record.seq) handle.close() ------------------------------------------------- I get this error: ------------------------------------------------- Traceback (most recent call last): File "fastEx.py", line 5, in for seq_record in SeqIO.parse(handle, "fasta") : AttributeError: 'module' object has no attribute 'parse' ------------------------------------------------- Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm using is opening correctly. API documentation reports that the 'parse' function is there. What am I doing wrong? I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1. Thanks in advance, Prashantha Hebbar Institute of Bioinformatics ITPL, Bangalore, INDIA _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From cjfields at uiuc.edu Wed Oct 10 10:14:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 10 Oct 2007 09:14:48 -0500 Subject: [BioPython] Code publications In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> Message-ID: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu> This is a question that could be posed for any open-source project. It differs per person in my opinion. For instance, I donate time and code to BioPerl based on several factors. Not reinventing the wheel, giving back to the community, access to the code base, and the joy of programming (believe it or not) are among them, but they aren't the only ones. Publications don't hurt but they aren't my primary motivation. It generally isn't the focus of my research, only a means to an end (to parse or generate data). I don't see anything wrong with it being someone else's primary drive to donate as long as they continue support their code post-publication, an issue that unfortunately pops up quite frequently. chris On Oct 10, 2007, at 7:20 AM, Fernando wrote: > Hi everybody, > > This might be off-topic, or maybe not: > > I've been working with biopython for a while and I am curious about > what the > authors get from all the exceptional work they are doing... I know > it won't > have to do anything with money, but in terms of publication / > copyrihts etc, > what are the adventages of having your code in biopython? Is there > a journey > / conference where the author publish their works and likewise they > can be > referenced or something like that? > > Thanks, > Fernando > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From biopython at maubp.freeserve.co.uk Wed Oct 10 08:42:01 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Oct 2007 13:42:01 +0100 Subject: [BioPython] Code publications In-Reply-To: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> Message-ID: <470CC899.6080802@maubp.freeserve.co.uk> Fernando wrote: > Hi everybody, > > This might be off-topic, or maybe not: > > I've been working with biopython for a while and I am curious about what the > authors get from all the exceptional work they are doing... I know it won't > have to do anything with money, but in terms of publication / copyrihts etc, > what are the adventages of having your code in biopython? Is there a journey > / conference where the author publish their works and likewise they can be > referenced or something like that? Pride? Looks good on a CV? Although I must say working on BioPerl would have been a better choice from the point of view of job hunting ;) Some of the specific modules have associated publications which get cited (e.g. Bio.PDB and Bio.Cluster - although the later is also available independently of Biopython). The closest to a general Biopython paper is currently Chapman and Chang 2000. In terms of talks, most recently I gave a talk at BOSC 2007 in July, the "Biopython Project Update". Which reminds me, I have a few photos and the slides (sadly in PowerPoint - my initial attempt to convert them into PDF wasn't great, font issues leading to content getting cropped). Peter From tiagoantao at gmail.com Wed Oct 10 12:59:56 2007 From: tiagoantao at gmail.com (Tiago Antao) Date: Wed, 10 Oct 2007 17:59:56 +0100 Subject: [BioPython] Code publications In-Reply-To: <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu> References: <7b13e61d0710100520j1845d5dar833924de6a92bb3f@mail.gmail.com> <865EDEE7-08D4-4058-9DD9-C4E790AFD327@uiuc.edu> Message-ID: <470D050C.7060500@gmail.com> I am currently submitting my populations genetics' code into biopython and I can talk about my motivations. Most of the code that I am submitting was used in something that I have done in the past (sometimes published). I figured, that if I have the code sitting here, I could as well donate it. This has one interesting advantage for me: all the code that I know I will try to submit to biopython is designed with care, all the code that is a one off is really a big mess. For me making code public is a motivator to maintain clean code. It is also a way to get to know people that are interested in this type of problems, and I think that, as with all things in life, knowing more people is a good thing. Maybe, in 12/18 months time I might think in suggesting to other people writing an article on the popgen work in biopython. Lets face it, that is also a good motivator. But, if it is the only one, I would agree that is not good (as Chris says, maintenance after publication...) Last, but not least: ethical and moral issues. Having spent some time outside of science I do think most scientific work is done in a very closed fashion (it was a shock to me, really). From my personal point of view open science and free software are arguments to which I connect moral value. Tiago Chris Fields wrote: > This is a question that could be posed for any open-source project. > > It differs per person in my opinion. For instance, I donate time and > code to BioPerl based on several factors. Not reinventing the wheel, > giving back to the community, access to the code base, and the joy of > programming (believe it or not) are among them, but they aren't the > only ones. > > Publications don't hurt but they aren't my primary motivation. It > generally isn't the focus of my research, only a means to an end (to > parse or generate data). I don't see anything wrong with it being > someone else's primary drive to donate as long as they continue > support their code post-publication, an issue that unfortunately pops > up quite frequently. > > chris > > On Oct 10, 2007, at 7:20 AM, Fernando wrote: > > >> Hi everybody, >> >> This might be off-topic, or maybe not: >> >> I've been working with biopython for a while and I am curious about >> what the >> authors get from all the exceptional work they are doing... I know >> it won't >> have to do anything with money, but in terms of publication / >> copyrihts etc, >> what are the adventages of having your code in biopython? Is there >> a journey >> / conference where the author publish their works and likewise they >> can be >> referenced or something like that? >> >> Thanks, >> Fernando >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From rebekah.rogers at gmail.com Thu Oct 11 14:57:21 2007 From: rebekah.rogers at gmail.com (Rebekah Rogers) Date: Thu, 11 Oct 2007 14:57:21 -0400 Subject: [BioPython] running PAML in python Message-ID: <79def59f0710111157h7483d5b5m6e6cdb3b86266750@mail.gmail.com> Hello: Does anyone know of an existing library that can run aligned sequences in PAML and then pull out the dN/dS values? Thanks! -Rebekah From The_Polymorph at rocketmail.com Sun Oct 14 13:04:48 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 14 Oct 2007 10:04:48 -0700 (PDT) Subject: [BioPython] Performing sequence alignments, etc. Message-ID: <311410.84366.qm@web50801.mail.re2.yahoo.com> Hi all. Hi all. I'm relatively new to the field of bioinformatics and I'm trying to perform a multiple sequence alignment on 5-6 sequences (fasta format - dna sequences). I'd like the output to be formatted in the following manner (clustalw standalone output): accession_number1: atctcgatatcgggcgctcta... accession_number2: atctctattctctggatctct... ... When one more more nucleotides columns are identical, clustalw displays an asterisk. If not, a blank space is displayed. Is this a standard feature of BioPython? Also, I'm evaluating several sequences but I'd like to obtain the most recent complete genomes possible from various countries. Is there a convenient source to use (GenBank?) if I don't know the accession numbers? Thanks, ~Caitlin Thanks, ~Caitlin ____________________________________________________________________________________ Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/ From biopython at maubp.freeserve.co.uk Sun Oct 14 13:38:32 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 14 Oct 2007 18:38:32 +0100 Subject: [BioPython] Performing sequence alignments, etc. In-Reply-To: <311410.84366.qm@web50801.mail.re2.yahoo.com> References: <311410.84366.qm@web50801.mail.re2.yahoo.com> Message-ID: <47125418.5020009@maubp.freeserve.co.uk> Caitlin wrote: > Hi all. > > I'm relatively new to the field of bioinformatics and I'm trying to > perform a multiple sequence alignment on 5-6 sequences (fasta format - > dna sequences). I'd like the output to be formatted in the following > manner (clustalw standalone output): For reading and writing Clustalw alignment files, you could either use Bio.SeqIO (format name "clustal") or the Bio.Clustalw module. http://biopython.org/wiki/SeqIO > When one more more nucleotides columns are identical, clustalw displays > an asterisk. If not, a blank space is displayed. Is this a standard > feature of BioPython? There is an example of Clustalw output online here - note there can also be a column of numbers on the right hand side (not shown here): http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format It sounds like you are describing the simple consensus string which clustalw outputs under the alignment (using *:. and space). Biopython has a SummaryInfo object which can calculate simple consensus sequences (see the tutorial). Perhaps this would be close to what you want to do. > Also, I'm evaluating several sequences but I'd like to obtain the most > recent complete genomes possible from various countries. Is there a > convenient source to use (GenBank?) if I don't know the accession > numbers? What sort of Genomes? Bacteria? Vertebrates? You could start by having a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these three are kept in sync with each other). Biopython has quite a nice interface for searching and downloading sequences from GenBank (again, see the tutorial) so that would be my first suggestion. Peter From The_Polymorph at rocketmail.com Sun Oct 14 22:13:24 2007 From: The_Polymorph at rocketmail.com (Caitlin) Date: Sun, 14 Oct 2007 19:13:24 -0700 (PDT) Subject: [BioPython] Performing sequence alignments, etc. In-Reply-To: <47125418.5020009@maubp.freeserve.co.uk> Message-ID: <129586.66498.qm@web50807.mail.re2.yahoo.com> Thanks Peter. The genomes are viral. I'll definitely read that tutorial. Your help is very appreciated. ~Caitlin --- Peter wrote: > Caitlin wrote: > > Hi all. > > > > I'm relatively new to the field of bioinformatics and I'm trying to > > perform a multiple sequence alignment on 5-6 sequences (fasta > format - > > dna sequences). I'd like the output to be formatted in the > following > > manner (clustalw standalone output): > > For reading and writing Clustalw alignment files, you could either > use > Bio.SeqIO (format name "clustal") or the Bio.Clustalw module. > http://biopython.org/wiki/SeqIO > > > When one more more nucleotides columns are identical, clustalw > displays > > an asterisk. If not, a blank space is displayed. Is this a standard > > feature of BioPython? > > There is an example of Clustalw output online here - note there can > also > be a column of numbers on the right hand side (not shown here): > http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format > > It sounds like you are describing the simple consensus string which > clustalw outputs under the alignment (using *:. and space). > > Biopython has a SummaryInfo object which can calculate simple > consensus > sequences (see the tutorial). Perhaps this would be close to what you > > want to do. > > > Also, I'm evaluating several sequences but I'd like to obtain the > most > > recent complete genomes possible from various countries. Is there a > > convenient source to use (GenBank?) if I don't know the accession > > numbers? > > What sort of Genomes? Bacteria? Vertebrates? You could start by > having > a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these > three are kept in sync with each other). > > Biopython has quite a nice interface for searching and downloading > sequences from GenBank (again, see the tutorial) so that would be my > first suggestion. > > Peter > > > > "Be who you are and say what you feel because those who mind don't matter and those who matter don't mind." - Dr. Seuss, "Oh the Places You'll Go" ____________________________________________________________________________________ Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos. http://autos.yahoo.com/index.html From fredgca at hotmail.com Mon Oct 15 09:02:27 2007 From: fredgca at hotmail.com (Frederico Arnoldi) Date: Mon, 15 Oct 2007 13:02:27 +0000 Subject: [BioPython] where is SeqIO.parse()? In-Reply-To: References: Message-ID: Dear Kiradi, Concerning your subject question: where is SeqIO.parse()? >>> from Bio import SeqIO >>> SeqIO So, in my system, it is at /usr/lib/python2.4/site-packages/Bio/SeqIO/__init__.py. Try the same command in your python console and see where it is in yours. Concerning your problem: Try >>> from Bio import SeqIO >>> dir() ['SeqIO', '__builtins__', '__doc__', '__name__'] >>> dir(SeqIO) ['Alignment', 'ClustalIO', 'FastaIO', 'InsdcIO', 'Interfaces', 'NexusIO', 'PhylipIO', 'Seq', 'SeqRecord', 'StockholmIO', 'StringIO', 'SwissIO', '_FormatToIterator', '_FormatToWriter', '__builtins__', '__doc__', '__file__', '__name__', '__path__', 'generic_alphabet', 'generic_protein', 'os', 'parse', 'to_alignment', 'to_dict', 'write'] Do you get the same result? See that "parse" is in my SeqIO. Is it in yours? I noted that installing biopython via apt in Ubunutu, the __init__.py in Bio/SeqIO was empty. Maybe it is the source of your problem. But if I am right, when you type, in your system, dir(SeqIO), you get ['__builtins__', '__doc__', '__file__', '__name__', '__path__'], confirming your __init__.py is empty. Check it. If this is your problem, try installing biopyton by the tar.gz file available in Biopython home page. Good luck, Fred ---------------------------------------------------------------------->> Message: 1> Date: Wed, 10 Oct 2007 17:47:26 +0530> From: Prashantha Hebbar Kiradi > Subject: [BioPython] where is SeqIO.parse()?> To: biopython at biopython.org> Message-ID: > Content-Type: text/plain; charset=ISO-8859-1; format=flowed>> Hi everybody,>> While trying the example of 'Parsing sequence file formats' from section> 2.4 of Biopython tutorial:> -------------------------------------------------> from Bio import SeqIO> handle = open("ls_orchid.fasta")> for seq_record in SeqIO.parse(handle, "fasta") :> print seq_record.id> print seq_record.seq> print len(seq_record.seq)> handle.close()> ------------------------------------------------->>> I get this error:> -------------------------------------------------> Traceback (most recent call last):> File "fastEx.py", line 5, in > for seq_record in SeqIO.parse(handle, "fasta") :> AttributeError: 'module' object has no attribute 'parse'> ------------------------------------------------->> Importing SeqIO doesn't raise any error and the ls_orchid.fasta file I'm> using is opening correctly.>> API documentation reports that the 'parse' function is there. What am I> doing wrong?>> I'm using biopython 1.42 installed from Ubuntu repository and python 2.5.1.>> Thanks in advance,>> Prashantha Hebbar> Institute of Bioinformatics> ITPL, _________________________________________________________________ Receba as ?ltimas not?cias do Brasil e do mundo direto no seu Messenger com Alertas MSN! ? GR?TIS! http://alertas.br.msn.com/ From ytu888 at hotmail.com Mon Oct 15 12:19:47 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 15 Oct 2007 11:19:47 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> Message-ID: Hi Steve, Thank you for your email. I was away for a week. What do you mean "fresh" python prompt? I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded online. I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, am I right? Once again, thank you very much for your help.. > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Wed, 3 Oct 2007 10:47:41 -0400 > To: ytu888 at hotmail.com > > > Steve, thank you very much. It fixed the problem and I got through > > the build and install step. But when I tested inside the python for > > the installation I got following error. Please help me about it. > > Thanks. > > > > >>> import MySQLdb > > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ > > site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > > _mysql.py:3: UserWarning: Module _mysql was already imported from / > > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, > > but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to > > sys.path > > import sys, pkg_resources, imp > > Traceback (most recent call last): > > File "", line 1, in > > File "MySQLdb/__init__.py", line 19, in > > import _mysql > > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in > > > > File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in > > __bootstrap__ > > ImportError: dlopen(/Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: / > > usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib > > Referenced from: /Users/lizhexu/.python-eggs/MySQL_python-1.2.2- > > py2.5-macosx-10.3-fat.egg-tmp/_mysql.so > > Reason: image not found > > > Sorry, don't know exactly what's happening here. Is this from a > "fresh" python prompt? > > How did you install MySQLdb, did you use easy_install? If so, try to > install from the sourceforge download. > > Try to remove it, remove the "build" directory from your mysqldb > download and redo the whole > python setup.py build / python setup.py install process > > To remove it, nuke this: > /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg > > And try to reinstall? > > Perhaps someone who knows what the problem is here can give you a > better idea on what to do. > > -steve _________________________________________________________________ Windows Live Hotmail and Microsoft Office Outlook ? together at last. ?Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033 From lists.steve at arachnedesign.net Mon Oct 15 12:30:21 2007 From: lists.steve at arachnedesign.net (Steve Lianoglou) Date: Mon, 15 Oct 2007 12:30:21 -0400 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> Message-ID: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Hi, > Thank you for your email. I was away for a week. > What do you mean "fresh" python prompt? > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded > online. > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, > am I right? I'm not sure, exactly. Last time I checked, the only thing you needed to use mysql from python was: (a) A working mysql install (the client/server) (b) The mysqldb package from: http://sourceforge.net/projects/mysql- python I'm assuming (a) is installed correctly since you are using the .mpkg from mysql.org, so I'd just try to fix (b). You try do so by doing the following: (1) Remove your original attempt at installing the python mysqldb library. From the looks of your error messages, it seems to be installed here: Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ (2) remove the build directory in your mysqldb directory (the one you are installing from) by cd-ing into your mysqldb download, and removing the build directory you find there. (3) reinstall mysqldb by doing the usual `pythong setup.py build` and `sudo python setup.py install` dance For the record, I'm not sure what you are talking about when you are distinguishing between "MySQL_python_1.2.2, not MySQLdb" are you trying to install two python libraries to access mysql? -steve From ytu888 at hotmail.com Mon Oct 15 13:18:42 2007 From: ytu888 at hotmail.com (Y Tu) Date: Mon, 15 Oct 2007 12:18:42 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Message-ID: What I said: "MySQL_python_1.2.2, not MySQLdb" means to uninstall MySQL_python not the mysql client/server installed with the mpkg. I just deleted the MYSQL....fat.egg file and downloaded the MySAL-python-1.2.2.tar. I repeated the installation process. However, when I run import MySQLdb, I got the same error message. Is there any other things I should take a look? Thank you very much. CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 15 Oct 2007 12:30:21 -0400 > To: ytu888 at hotmail.com > > Hi, > > > Thank you for your email. I was away for a week. > > What do you mean "fresh" python prompt? > > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded > > online. > > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, > > am I right? > > I'm not sure, exactly. > > Last time I checked, the only thing you needed to use mysql from > python was: > > (a) A working mysql install (the client/server) > (b) The mysqldb package from: http://sourceforge.net/projects/mysql- > python > > I'm assuming (a) is installed correctly since you are using the .mpkg > from mysql.org, so I'd just try to fix (b). > > You try do so by doing the following: > > (1) Remove your original attempt at installing the python mysqldb > library. From the looks of your error messages, it seems to be > installed here: > > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > > (2) remove the build directory in your mysqldb directory (the one you > are installing from) by cd-ing into your mysqldb download, and > removing the build directory you find there. > > (3) reinstall mysqldb by doing the usual `pythong setup.py build` and > `sudo python setup.py install` dance > > For the record, I'm not sure what you are talking about when you are > distinguishing between "MySQL_python_1.2.2, not MySQLdb" > > are you trying to install two python libraries to access mysql? > > -steve > _________________________________________________________________ Boo!?Scare away worms, viruses and so much more! Try Windows Live OneCare! http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews From ytu888 at hotmail.com Tue Oct 16 13:06:36 2007 From: ytu888 at hotmail.com (Y Tu) Date: Tue, 16 Oct 2007 12:06:36 -0500 Subject: [BioPython] Error for installation of MySALdb on Mac OS X In-Reply-To: <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> References: <46FCF325.4040002@maubp.freeserve.co.uk> <46FD2BAC.80401@maubp.freeserve.co.uk> <46FD5927.3000207@maubp.freeserve.co.uk> <374A1E10-E0B6-4B21-A00C-0B11F34BBFD0@arachnedesign.net> <38EF94F2-7EB8-438C-BCA5-0E48818A6974@arachnedesign.net> <14D13653-0A67-4AE0-9C80-43B58158CFB7@arachnedesign.net> <908975AE-B215-451E-8EBF-C374B6EE3C38@arachnedesign.net> Message-ID: Hi, I reinstalled everything and checked every step. I found that there are had some warnings in 'build" step (underlined) . I wonder if they are the reason why I got the error messages when running "import MySQLdb" under the python prompt and how to fix the problem. Thank you very much. LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python setup.py build running build running build_py ... ... /usr/bin/ld: for architecture ppc /usr/bin/ld: warning build/temp.macosx-10.3-fat-2.5/_mysql.o cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) /usr/bin/ld: warning /usr/local/mysql/lib/libmysqlclient_r.dylib cputype (7, architecture i386) does not match cputype (18) for specified -arch flag: ppc (file not loaded) LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ sudo python setup.py install Password: running install ... ... Adding MySQL-python 1.2.2 to easy-install.pth file Installed /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg Processing dependencies for MySQL-python==1.2.2 LeesComputer:/Applications/Python_Bio/MySQL-python-1.2.2 Lee$ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import MySQLdb /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/_mysql.pyc, but /Applications/Python_Bio/MySQL-python-1.2.2 is being added to sys.path import sys, pkg_resources, imp Traceback (most recent call last): File "", line 1, in File "MySQLdb/__init__.py", line 19, in import _mysql File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 7, in File "build/bdist.macosx-10.3-fat/egg/_mysql.py", line 6, in __bootstrap__ ImportError: dlopen(/Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so, 2): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient_r.15.dylib Referenced from: /Users/Lee/.python-eggs/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg-tmp/_mysql.so Reason: image not found > CC: biopython at lists.open-bio.org > From: lists.steve at arachnedesign.net > Subject: Re: [BioPython] Error for installation of MySALdb on Mac OS X > Date: Mon, 15 Oct 2007 12:30:21 -0400 > To: ytu888 at hotmail.com > > Hi, > > > Thank you for your email. I was away for a week. > > What do you mean "fresh" python prompt? > > I installed MySQL by using MYSQL-5.0.45-osx10.4-i686.dmg downloaded > > online. > > I guess you want me to reinstall MySQL_python_1.2.2, not MySQLdb, > > am I right? > > I'm not sure, exactly. > > Last time I checked, the only thing you needed to use mysql from > python was: > > (a) A working mysql install (the client/server) > (b) The mysqldb package from: http://sourceforge.net/projects/mysql- > python > > I'm assuming (a) is installed correctly since you are using the .mpkg > from mysql.org, so I'd just try to fix (b). > > You try do so by doing the following: > > (1) Remove your original attempt at installing the python mysqldb > library. From the looks of your error messages, it seems to be > installed here: > > Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site- > packages/MySQL_python-1.2.2-py2.5-macosx-10.3-fat.egg/ > > (2) remove the build directory in your mysqldb directory (the one you > are installing from) by cd-ing into your mysqldb download, and > removing the build directory you find there. > > (3) reinstall mysqldb by doing the usual `pythong setup.py build` and > `sudo python setup.py install` dance > > For the record, I'm not sure what you are talking about when you are > distinguishing between "MySQL_python_1.2.2, not MySQLdb" > > are you trying to install two python libraries to access mysql? > > -steve > _________________________________________________________________ Peek-a-boo FREE Tricks & Treats for You! http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us From fennan at gmail.com Tue Oct 16 13:51:30 2007 From: fennan at gmail.com (Fernando) Date: Tue, 16 Oct 2007 19:51:30 +0200 Subject: [BioPython] Precompute database information Message-ID: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> Hi everybody, I am thinking in including some algorithms that I work with into biopython. My first concern is that I'm using a local image of the Gene Ontology database to perform several operations. In order to avoid such database accesses I could precompute the information I need and load it once the module is called. How should I do it? Is there a guideline style to load external variables or something like that? Any other ideas/suggestions? Thanks From fennan at gmail.com Tue Oct 16 14:55:54 2007 From: fennan at gmail.com (Fernando) Date: Tue, 16 Oct 2007 20:55:54 +0200 Subject: [BioPython] Precompute database information In-Reply-To: <4714FD13.2020708@maubp.freeserve.co.uk> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> Message-ID: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> Hi Peter, >How big would your pre-computed data be? If its some sort of table or >other simple data you could perhaps use a simple text file; Another idea > for complicated objects is to use python's pickle module. It would be big... I an dealing with pairwise terms comparisons and I want to consider different species as well. >How often would the pre-computed data need to be updated? Every time >there is a new Gene Ontology release? It might be better have the >module download and cache the latest version on request (rather than >shipping an out of date dataset with Biopython). Yes, I could do that... It would be OK in Biopython to use mysql? If so the module could download the last GO version on request, install it and work with that version until the users decides to update it. On 10/16/07, Peter wrote: > > Fernando wrote: > > Hi everybody, > > > > I am thinking in including some algorithms that I work with into > biopython. > > My first concern is that I'm using a local image of the Gene Ontology > > database to perform several operations. In order to avoid such database > > accesses I could precompute the information I need and load it once the > > module is called. How should I do it? Is there a guideline style to load > > external variables or something like that? Any other ideas/suggestions? > > I think you need to go into more detail. > > How big would your pre-computed data be? If its some sort of table or > other simple data you could perhaps use a simple text file; Another idea > for complicated objects is to use python's pickle module. > > How often would the pre-computed data need to be updated? Every time > there is a new Gene Ontology release? It might be better have the > module download and cache the latest version on request (rather than > shipping an out of date dataset with Biopython). > > I don't think we have anything in Biopython that requires regular > updates. Things like genomes and sequence databases are left up to the > user. > > Peter > > From sdavis2 at mail.nih.gov Tue Oct 16 15:26:18 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 16 Oct 2007 15:26:18 -0400 Subject: [BioPython] Precompute database information In-Reply-To: <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> Message-ID: <4715105A.30705@mail.nih.gov> Fernando wrote: > Hi Peter, > >> How big would your pre-computed data be? If its some sort of table or >> other simple data you could perhaps use a simple text file; Another idea >> for complicated objects is to use python's pickle module. > > It would be big... I an dealing with pairwise terms comparisons and I want > to consider different species as well. > >> How often would the pre-computed data need to be updated? Every time >> there is a new Gene Ontology release? It might be better have the >> module download and cache the latest version on request (rather than >> shipping an out of date dataset with Biopython). > > Yes, I could do that... It would be OK in Biopython to use mysql? If so the > module could download the last GO version on request, install it and work > with that version until the users decides to update it. Asking users to use MySQL to do updates might be a bit much. Could this be done from the .obo files? Sean From biopython at maubp.freeserve.co.uk Tue Oct 16 14:04:03 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 16 Oct 2007 19:04:03 +0100 Subject: [BioPython] Precompute database information In-Reply-To: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> Message-ID: <4714FD13.2020708@maubp.freeserve.co.uk> Fernando wrote: > Hi everybody, > > I am thinking in including some algorithms that I work with into biopython. > My first concern is that I'm using a local image of the Gene Ontology > database to perform several operations. In order to avoid such database > accesses I could precompute the information I need and load it once the > module is called. How should I do it? Is there a guideline style to load > external variables or something like that? Any other ideas/suggestions? I think you need to go into more detail. How big would your pre-computed data be? If its some sort of table or other simple data you could perhaps use a simple text file; Another idea for complicated objects is to use python's pickle module. How often would the pre-computed data need to be updated? Every time there is a new Gene Ontology release? It might be better have the module download and cache the latest version on request (rather than shipping an out of date dataset with Biopython). I don't think we have anything in Biopython that requires regular updates. Things like genomes and sequence databases are left up to the user. Peter From fennan at gmail.com Wed Oct 17 07:12:36 2007 From: fennan at gmail.com (Fernando) Date: Wed, 17 Oct 2007 07:12:36 -0400 Subject: [BioPython] Precompute database information In-Reply-To: <4715105A.30705@mail.nih.gov> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> <4715105A.30705@mail.nih.gov> Message-ID: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> >Asking users to use MySQL to do updates might be a bit much. Could this >be done from the .obo files? I think that's probably the best solution... Is there any python module for working with OBO / OWL formats? I've been searching but people seem to use BioPerl for this matter On 10/16/07, Sean Davis wrote: > > Fernando wrote: > > Hi Peter, > > > >> How big would your pre-computed data be? If its some sort of table or > >> other simple data you could perhaps use a simple text file; Another > idea > >> for complicated objects is to use python's pickle module. > > > > It would be big... I an dealing with pairwise terms comparisons and I > want > > to consider different species as well. > > > >> How often would the pre-computed data need to be updated? Every time > >> there is a new Gene Ontology release? It might be better have the > >> module download and cache the latest version on request (rather than > >> shipping an out of date dataset with Biopython). > > > > Yes, I could do that... It would be OK in Biopython to use mysql? If so > the > > module could download the last GO version on request, install it and > work > > with that version until the users decides to update it. > > Asking users to use MySQL to do updates might be a bit much. Could this > be done from the .obo files? > > Sean > From sdavis2 at mail.nih.gov Wed Oct 17 11:34:17 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Wed, 17 Oct 2007 11:34:17 -0400 Subject: [BioPython] Precompute database information In-Reply-To: <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <4714FD13.2020708@maubp.freeserve.co.uk> <7b13e61d0710161155o2e933f13jf448fe2097f6a184@mail.gmail.com> <4715105A.30705@mail.nih.gov> <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> Message-ID: <47162B79.8080204@mail.nih.gov> Fernando wrote: >>Asking users to use MySQL to do updates might be a bit much. Could this >>be done from the .obo files? > > I think that's probably the best solution... Is there any python module > for working with OBO / OWL formats? I've been searching but people seem > to use BioPerl for this matter In a way, it seems silly to reimplement the Bio::OntologyIO stuff in python, but I (and others, after a quick google search) would probably benefit from such a thing. I'm not able to devote much time right this minute to the project, but I think that, given the huge number of particularly obo format files available, there would be use for such parsers and tools in biopython. How much interest/need is there for a Bio.OntologyIO like thing? Has anyone made any attempts at creating one? For a list of available biologic ontologies (to see what we are missing), see here: http://obofoundry.org/ Sean From luca.beltrame at unimi.it Wed Oct 17 11:59:47 2007 From: luca.beltrame at unimi.it (Luca Beltrame) Date: Wed, 17 Oct 2007 17:59:47 +0200 Subject: [BioPython] Precompute database information In-Reply-To: <47162B79.8080204@mail.nih.gov> References: <7b13e61d0710161051k20d07deco79178f0a0dd61f59@mail.gmail.com> <7b13e61d0710170412t76f92271h99834607dc9c0063@mail.gmail.com> <47162B79.8080204@mail.nih.gov> Message-ID: <200710171759.48595.luca.beltrame@unimi.it> Il Wednesday 17 October 2007 17:34:17 Sean Davis ha scritto: > In a way, it seems silly to reimplement the Bio::OntologyIO stuff in It depends on the perspective, as for some learning yet another programming language would be a drawback. > parsers and tools in biopython. How much interest/need is there for a > Bio.OntologyIO like thing? Has anyone made any attempts at creating one? Personally speaking, I would love it. No time (and skill) to even think about doing something like that, though. -- Luca Beltrame, MSc. - Molecular Medicine PhD Student Dipartimento di Scienze e Tecnologie Biomediche - UniMI CNR - Institute of Biomedical Technologies Research Fellow E-mail: luca dot beltrame [at] unimi dot it - Phone: +39-02-50320924 From jimmy.musselwhite at gmail.com Wed Oct 17 17:20:41 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 17:20:41 -0400 Subject: [BioPython] Question about Seq.count() Message-ID: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> Hello all I have a script that is running through a list of about 250,000 sequence records and counting the number of times it counts substrings of 3-5 nucleotides in length Here is some example code search = 'ATTCG' #use SeqIO to get a big list of records sequences = list(SeqIO.parse(file, "fasta") for record in sequences : Now the code I want to do is record.seq.count(search) but what I am forced to do is record.seq.tostring().count(search) The problem here is that when I am forced to use .tostring() on every single seq object it devastates my memory usage in a BIG way. It eats up about 1.2gigs and then crashes. If I remove the .tostring() and just tell if to search for 'A', it will run fine and use memory at about 1/100th the rate So my question sums down to, is there any way to make .count() be able to search for strings and not just characters? Otherwise my work is going to grind to a halt here. Thanks! From biopython at maubp.freeserve.co.uk Wed Oct 17 18:03:51 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Oct 2007 23:03:51 +0100 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> Message-ID: <471686C7.6050305@maubp.freeserve.co.uk> Jimmy Musselwhite wrote: > Now the code I want to do is > record.seq.count(search) > > but what I am forced to do is > record.seq.tostring().count(search) > > The problem here is that when I am forced to use .tostring() on every single > seq object it devastates my memory usage in a BIG way. It eats up about > 1.2gigs and then crashes. If I remove the .tostring() and just tell if to > search for 'A', it will run fine and use memory at about 1/100th the rate In the short term, try record.seq.data.count(search) which is what the tostring() method is doing anyway (the Seq object stores the sequence internally as a string). Does that help? We might be tweaking the Seq object after the next release to act a bit more like a string - at which point the .data property might go away. > So my question sums down to, is there any way to make .count() be able to > search for strings and not just characters? You I'd never noticed that - I would call it a bug... >>> from Bio.Seq import Seq >>> my_seq = Seq("AAACACACGGTTTT") >>> my_seq.data.count("GG") 1 >>> my_seq.data.count("G") 2 >>> my_seq.tostring().count("G") 2 >>> my_seq.tostring().count("GG") 1 >>> my_seq.count("G") 2 >>> my_seq.count("GG") 0 Peter From jimmy.musselwhite at gmail.com Wed Oct 17 18:48:09 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 18:48:09 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <471686C7.6050305@maubp.freeserve.co.uk> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> Message-ID: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> Thanks guys! That worked great. On 10/17/07, Peter wrote: > > Jimmy Musselwhite wrote: > > Now the code I want to do is > > record.seq.count(search) > > > > but what I am forced to do is > > record.seq.tostring().count(search) > > > > The problem here is that when I am forced to use .tostring() on every > single > > seq object it devastates my memory usage in a BIG way. It eats up about > > 1.2gigs and then crashes. If I remove the .tostring() and just tell if > to > > search for 'A', it will run fine and use memory at about 1/100th the > rate > > In the short term, try record.seq.data.count(search) which is what the > tostring() method is doing anyway (the Seq object stores the sequence > internally as a string). Does that help? > > We might be tweaking the Seq object after the next release to act a bit > more like a string - at which point the .data property might go away. > > > So my question sums down to, is there any way to make .count() be able > to > > search for strings and not just characters? > > You I'd never noticed that - I would call it a bug... > > >>> from Bio.Seq import Seq > >>> my_seq = Seq("AAACACACGGTTTT") > >>> my_seq.data.count("GG") > 1 > >>> my_seq.data.count("G") > 2 > >>> my_seq.tostring().count("G") > 2 > >>> my_seq.tostring().count("GG") > 1 > >>> my_seq.count("G") > 2 > >>> my_seq.count("GG") > 0 > > Peter > > From jimmy.musselwhite at gmail.com Wed Oct 17 18:52:07 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 18:52:07 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> Message-ID: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> Just kidding, it didn't work great. It only "fixed" it because I was printing out the output of count() and so it was just executing 100 times slower and thus eating RAM 100 times slower :( It doesn't seem like there is a good way for me to fix this. On 10/17/07, Jimmy Musselwhite wrote: > > Thanks guys! That worked great. > > On 10/17/07, Peter wrote: > > > > Jimmy Musselwhite wrote: > > > Now the code I want to do is > > > record.seq.count(search) > > > > > > but what I am forced to do is > > > record.seq.tostring().count(search) > > > > > > The problem here is that when I am forced to use .tostring() on every > > single > > > seq object it devastates my memory usage in a BIG way. It eats up > > about > > > 1.2gigs and then crashes. If I remove the .tostring() and just tell if > > to > > > search for 'A', it will run fine and use memory at about 1/100th the > > rate > > > > In the short term, try record.seq.data.count (search) which is what the > > tostring() method is doing anyway (the Seq object stores the sequence > > internally as a string). Does that help? > > > > We might be tweaking the Seq object after the next release to act a bit > > more like a string - at which point the .data property might go away. > > > > > So my question sums down to, is there any way to make .count() be able > > to > > > search for strings and not just characters? > > > > You I'd never noticed that - I would call it a bug... > > > > >>> from Bio.Seq import Seq > > >>> my_seq = Seq("AAACACACGGTTTT") > > >>> my_seq.data.count("GG") > > 1 > > >>> my_seq.data.count("G") > > 2 > > >>> my_seq.tostring().count("G") > > 2 > > >>> my_seq.tostring().count("GG") > > 1 > > >>> my_seq.count("G") > > 2 > > >>> my_seq.count("GG") > > 0 > > > > Peter > > > > > From jimmy.musselwhite at gmail.com Wed Oct 17 19:04:26 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 19:04:26 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> Message-ID: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com> In response to the first reply you gave me, where you said this You I'd never noticed that - I would call it a bug... >>> from Bio.Seq import Seq >>> my_seq = Seq("AAACACACGGTTTT") >>> my_seq.data.count("GG") 1 >>> my_seq.data.count("G") 2 >>> my_seq.tostring().count("G") 2 >>> my_seq.tostring().count("GG") 1 >>> my_seq.count("G") 2 >>> my_seq.count("GG") 0 I've tried that many many times and I always get 0 when I do my_seq.count("GG") I just rebuilt biopython from the latest CVS tarball and it still does not work. I have no idea why yours works and mine doesn't. On 10/17/07, Jimmy Musselwhite wrote: > > Just kidding, it didn't work great. It only "fixed" it because I was > printing out the output of count() and so it was just executing 100 times > slower and thus eating RAM 100 times slower :( > > It doesn't seem like there is a good way for me to fix this. > > On 10/17/07, Jimmy Musselwhite wrote: > > > > Thanks guys! That worked great. > > > > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote: > > > > > > Jimmy Musselwhite wrote: > > > > Now the code I want to do is > > > > record.seq.count(search) > > > > > > > > but what I am forced to do is > > > > record.seq.tostring().count(search) > > > > > > > > The problem here is that when I am forced to use .tostring() on > > > every single > > > > seq object it devastates my memory usage in a BIG way. It eats up > > > about > > > > 1.2gigs and then crashes. If I remove the .tostring() and just tell > > > if to > > > > search for 'A', it will run fine and use memory at about 1/100th the > > > rate > > > > > > In the short term, try record.seq.data.count (search) which is what > > > the > > > tostring() method is doing anyway (the Seq object stores the sequence > > > internally as a string). Does that help? > > > > > > We might be tweaking the Seq object after the next release to act a > > > bit > > > more like a string - at which point the .data property might go away. > > > > > > > So my question sums down to, is there any way to make .count() be > > > able to > > > > search for strings and not just characters? > > > > > > You I'd never noticed that - I would call it a bug... > > > > > > >>> from Bio.Seq import Seq > > > >>> my_seq = Seq("AAACACACGGTTTT") > > > >>> my_seq.data.count("GG") > > > 1 > > > >>> my_seq.data.count("G") > > > 2 > > > >>> my_seq.tostring().count("G") > > > 2 > > > >>> my_seq.tostring().count("GG") > > > 1 > > > >>> my_seq.count("G") > > > 2 > > > >>> my_seq.count("GG") > > > 0 > > > > > > Peter > > > > > > > > > From jimmy.musselwhite at gmail.com Wed Oct 17 19:06:03 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 17 Oct 2007 19:06:03 -0400 Subject: [BioPython] Question about Seq.count() In-Reply-To: <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com> References: <86e5e8970710171420k6ffbde67j6a28eae2a8363521@mail.gmail.com> <471686C7.6050305@maubp.freeserve.co.uk> <86e5e8970710171548k68c78bf5n16a6056883c25b67@mail.gmail.com> <86e5e8970710171552j7e638cc0xae177e5ed5845f3f@mail.gmail.com> <86e5e8970710171604p612f5583v6ef32f90eca86861@mail.gmail.com> Message-ID: <86e5e8970710171606x4ac9b3feg23f2409a4385d237@mail.gmail.com> Man I"m sorry, I didn't read that well enough. It doesn't work for you either. I'm gonna stop responding to this e-mail now :) I'm clearly tired or something. On 10/17/07, Jimmy Musselwhite wrote: > > In response to the first reply you gave me, where you said this > > You I'd never noticed that - I would call it a bug... > > >>> from Bio.Seq import Seq > >>> my_seq = Seq("AAACACACGGTTTT") > >>> my_seq.data.count("GG") > 1 > >>> my_seq.data.count("G") > 2 > >>> my_seq.tostring().count("G") > 2 > >>> my_seq.tostring().count("GG") > 1 > >>> my_seq.count("G") > 2 > >>> my_seq.count("GG") > 0 > > > I've tried that many many times and I always get 0 when I do > my_seq.count("GG") > I just rebuilt biopython from the latest CVS tarball and it still does not > work. I have no idea why yours works and mine doesn't. > > On 10/17/07, Jimmy Musselwhite wrote: > > > > Just kidding, it didn't work great. It only "fixed" it because I was > > printing out the output of count() and so it was just executing 100 times > > slower and thus eating RAM 100 times slower :( > > > > It doesn't seem like there is a good way for me to fix this. > > > > On 10/17/07, Jimmy Musselwhite < jimmy.musselwhite at gmail.com> wrote: > > > > > > Thanks guys! That worked great. > > > > > > On 10/17/07, Peter < biopython at maubp.freeserve.co.uk> wrote: > > > > > > > > Jimmy Musselwhite wrote: > > > > > Now the code I want to do is > > > > > record.seq.count(search) > > > > > > > > > > but what I am forced to do is > > > > > record.seq.tostring().count(search) > > > > > > > > > > The problem here is that when I am forced to use .tostring() on > > > > every single > > > > > seq object it devastates my memory usage in a BIG way. It eats up > > > > about > > > > > 1.2gigs and then crashes. If I remove the .tostring() and just > > > > tell if to > > > > > search for 'A', it will run fine and use memory at about 1/100th > > > > the rate > > > > > > > > In the short term, try record.seq.data.count (search) which is what > > > > the > > > > tostring() method is doing anyway (the Seq object stores the > > > > sequence > > > > internally as a string). Does that help? > > > > > > > > We might be tweaking the Seq object after the next release to act a > > > > bit > > > > more like a string - at which point the .data property might go > > > > away. > > > > > > > > > So my question sums down to, is there any way to make .count() be > > > > able to > > > > > search for strings and not just characters? > > > > > > > > You I'd never noticed that - I would call it a bug... > > > > > > > > >>> from Bio.Seq import Seq > > > > >>> my_seq = Seq("AAACACACGGTTTT") > > > > >>> my_seq.data.count("GG") > > > > 1 > > > > >>> my_seq.data.count("G") > > > > 2 > > > > >>> my_seq.tostring().count("G") > > > > 2 > > > > >>> my_seq.tostring().count("GG") > > > > 1 > > > > >>> my_seq.count("G") > > > > 2 > > > > >>> my_seq.count("GG") > > > > 0 > > > > > > > > Peter > > > > > > > > > > > > >