From Paul.Czodrowski at merck.de Tue May 3 06:56:10 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 3 May 2011 12:56:10 +0200 Subject: [Biopython] installation as non-administrator Message-ID: Dear folks, I'm struggling around with the biopython installation. As non-administrator, the manual states the following: http://biopython.org/DIST/docs/install/Installation.html#htoc30 However, the setup.py (version 1.57) does not contain any entry " include_dirs=["Bio/Cluster", "your_dir/include/python"] ", but rather only "Bio" entries. (See attached file: setup.py) Or do I oversee anything? Regards, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. -------------- next part -------------- A non-text attachment was scrubbed... Name: setup.py Type: application/octet-stream Size: 11597 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Tue May 3 07:31:31 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 May 2011 12:31:31 +0100 Subject: [Biopython] installation as non-administrator In-Reply-To: References: Message-ID: On Tue, May 3, 2011 at 11:56 AM, wrote: > > Dear folks, > > I'm struggling around with the biopython installation. > As non-administrator, the manual states the following: > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > > However, the setup.py (version 1.57) does not contain any entry " > include_dirs=["Bio/Cluster", "your_dir/include/python"] > ", but rather only "Bio" entries. > > (See attached file: setup.py) You didn't really need to attach a whole file, you could have linked to our repository or quoted the bit of interest. > Or do I oversee anything? What OS are you using? Some flavour of Linux? What version of NumPy do you have, and how was it installed? What command did you use to attempt the install, and what error message did you get. Have you tried the --prefix argument? e.g. python setup.py build python setup.py test python setup.py install --prefix=$HOME Peter From anaryin at gmail.com Tue May 3 07:32:05 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 3 May 2011 13:32:05 +0200 Subject: [Biopython] installation as non-administrator In-Reply-To: References: Message-ID: Hey Paul, I usually keep a copy of biopython in my home directory either by supplying the keyword --home=/my/home/directory or just by making "python setup.py build" and then adding the temp/libxxx/ directory to my PYTHONPATH. Hope it helps, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, May 3, 2011 at 12:56 PM, wrote: > > Dear folks, > > I'm struggling around with the biopython installation. > As non-administrator, the manual states the following: > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > > However, the setup.py (version 1.57) does not contain any entry " > include_dirs=["Bio/Cluster", "your_dir/include/python"] > ", but rather only "Bio" entries. > > (See attached file: setup.py) > > Or do I oversee anything? > > > Regards, > Paul > > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From anaryin at gmail.com Tue May 3 07:32:47 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 3 May 2011 13:32:47 +0200 Subject: [Biopython] installation as non-administrator In-Reply-To: References: Message-ID: Sorry, --prefix, not --home. From mmokrejs at fold.natur.cuni.cz Tue May 3 08:22:38 2011 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Tue, 03 May 2011 14:22:38 +0200 Subject: [Biopython] How to optimize ACE file alignment (from newbler) Message-ID: <4DBFF38E.7050406@fold.natur.cuni.cz> Hi, I would like to ask you how can I optimize the ACE alignment with files produced by newbler. I see only the high-quality region is aligned while the rest is not. I typically ask newbler to place into the ace files untrimmed reads so the low-quality sequence is present, you can see it could have been included in the alignment and contribute the consensus quite well. I found a new feature of consed-20 being able to re-align the reads but that seemed to be too slow for me and had to kill re-processing of one contig. Is there a way to direct some program that I want to re-align just some columns since some position? That should first align to the consensus already defined and afterwards continue with de novo alignment as long as it is possible. Alternatively, how do you edit ACE alignments (I mean manually adjust gaps, move columns back and forth, re-order rows) and do you re-calculate the consensus? This is some sort of a follow-up to "Newbler ACE file to SAM?" posted to biopython-developers list at http://web.archiveorange.com/archive/v/5dAwXxUKZDTmQdM80MqQ ;) Martin From p.j.a.cock at googlemail.com Tue May 3 09:46:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 May 2011 14:46:25 +0100 Subject: [Biopython] How to optimize ACE file alignment (from newbler) In-Reply-To: <4DBFF38E.7050406@fold.natur.cuni.cz> References: <4DBFF38E.7050406@fold.natur.cuni.cz> Message-ID: On Tue, May 3, 2011 at 1:22 PM, Martin Mokrejs wrote: > Hi, > ?I would like to ask you how can I optimize the ACE alignment with files > produced by newbler. I see only the high-quality region is aligned while > the rest is not. I typically ask newbler to place into the ace files untrimmed > reads so the low-quality sequence is present, you can see it could have been > included in the alignment and contribute the consensus quite well. > ?I found a new feature of consed-20 being able to re-align the reads > but that seemed to be too slow for me and had to kill re-processing of one > contig. > ?Is there a way to direct some program that I want to re-align just some > columns since some position? That should first align to the consensus already > defined and afterwards continue with de novo alignment as long as it is possible. > ?Alternatively, how do you edit ACE alignments (I mean manually adjust gaps, > move columns back and forth, re-order rows) and do you re-calculate the > consensus? > ?This is some sort of a follow-up to "Newbler ACE file to SAM?" > posted to biopython-developers list at http://web.archiveorange.com/archive/v/5dAwXxUKZDTmQdM80MqQ > ;) > Martin Hi Martin, Biopython only has an ACE parser, with no support for writing ACE files. So, even if you did manipulate the parsed ACE file in Biopython, you'd have to write your own output code (or use a simpler file format). Regarding assembly editors, have you looked at Gap4 or Gap5? This might be a good question to ask on the http://seqanswers.com forum. Peter From Paul.Czodrowski at merck.de Tue May 3 10:38:25 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 3 May 2011 16:38:25 +0200 Subject: [Biopython] Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Peter, > > > > Dear folks, > > > > I'm struggling around with the biopython installation. > > As non-administrator, the manual states the following: > > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > > > > However, the setup.py (version 1.57) does not contain any entry " > > include_dirs=["Bio/Cluster", "your_dir/include/python"] > > ", but rather only "Bio" entries. > > > > (See attached file: setup.py) > > You didn't really need to attach a whole file, you could have > linked to our repository or quoted the bit of interest. I'm sorry for this! > > > Or do I oversee anything? > > What OS are you using? Some flavour of Linux? OpenSuse 11.3 > > What version of NumPy do you have, and how was it installed? NumPy version 1.3.0, installed locally by the built-in python routines. > > What command did you use to attempt the install, and what > error message did you get. python setup.py --build ==> ERROR MESSAGE " running build running build_py running build_ext building 'Bio.Cluster.cluster' extension gcc -pthread -fno-strict-aliasing -DNDEBUG -fomit-frame-pointer -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fPIC -I/usr/lib/python2.6/site-packages/numpy/core/include -I/usr/include/python2.6 -c Bio/Cluster/clustermodule.c -o build/temp.linux-i686-2.6/Bio/Cluster/clustermodule.o Bio/Cluster/clustermodule.c:2:31: fatal error: numpy/arrayobject.h: No such file or directory compilation terminated. error: command 'gcc' failed with exit status 1 " > > Have you tried the --prefix argument? > > e.g. > > python setup.py build > python setup.py test > python setup.py install --prefix=$HOME > > Peter python setup.py --test ==> ERROR MESSAGE " python setup.py test running test Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] Operating system: posix linux2 test_Ace ... ok test_AlignIO ... ok test_AlignIO_convert ... ok test_BioSQL ... /xyz: UserWarning: order location operators are not fully supported % feature.location_operator) ok test_BioSQL_SeqIO ... ERROR test_CAPS ... ok test_Clustalw ... ok test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use Bio.Clustalw. test_Cluster ... skipping. If you want to use Bio.Cluster, install NumPy first and then reinstall Biopython test_CodonTable ... ok test_CodonUsage ... ok test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... ok test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... Segmentation fault (core dumped) " python setup.py install --prefix=$HOME ==> the same ERROR MESSAGE as from "python setup.py build" Cheers & thanks in advance, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From Paul.Czodrowski at merck.de Tue May 3 10:47:00 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 3 May 2011 16:47:00 +0200 Subject: [Biopython] Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Peter, maybe as additonal question/issue: numpy is not located in "/usr/lib/python2.6/site-packages/numpy/core/include " but in another, rather global, python-lib-directory. As stated in my previous email, python setup.py build gives "gcc -pthread -fno-strict-aliasing -DNDEBUG -fomit-frame-pointer -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fPIC -I/usr/lib/python2.6/site-packages/numpy/core/include -I/usr/include/python2.6 -c Bio/Cluster/clustermodule.c -o build/temp.linux-i686-2.6/Bio/Cluster/clustermodule.o Bio/Cluster/clustermodule.c:2:31: fatal error: numpy/arrayobject.h: No such file or directory" and I would like to adapt the "-I/usr/lib/python2.6/site-packages/numpy/core/includ" accordingly to the directory where it is actually located. Cheers & thanks, Paul > > > > Dear folks, > > > > I'm struggling around with the biopython installation. > > As non-administrator, the manual states the following: > > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > > > > However, the setup.py (version 1.57) does not contain any entry " > > include_dirs=["Bio/Cluster", "your_dir/include/python"] > > ", but rather only "Bio" entries. > > > > (See attached file: setup.py) > > You didn't really need to attach a whole file, you could have > linked to our repository or quoted the bit of interest. > > > Or do I oversee anything? > > What OS are you using? Some flavour of Linux? > > What version of NumPy do you have, and how was it installed? > > What command did you use to attempt the install, and what > error message did you get. > > Have you tried the --prefix argument? > > e.g. > > python setup.py build > python setup.py test > python setup.py install --prefix=$HOME > > Peter This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From p.j.a.cock at googlemail.com Tue May 3 11:10:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 May 2011 16:10:30 +0100 Subject: [Biopython] Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On Tue, May 3, 2011 at 3:38 PM, wrote: > Dear Peter, > > >> > >> > Dear folks, >> > >> > I'm struggling around with the biopython installation. >> > As non-administrator, the manual states the following: >> > http://biopython.org/DIST/docs/install/Installation.html#htoc30 >> > >> > However, the setup.py (version 1.57) does not contain any entry " >> > include_dirs=["Bio/Cluster", "your_dir/include/python"] >> > ", but rather only "Bio" entries. >> > >> > (See attached file: setup.py) >> >> You didn't really need to attach a whole file, you could have >> linked to our repository or quoted the bit of interest. > > I'm sorry for this! Don't worry too much, its a fairly small file otherwise I wouldn't have let it though the moderation queue. >> > Or do I oversee anything? >> >> What OS are you using? Some flavour of Linux? > > OpenSuse 11.3 Should be fine. >> >> What version of NumPy do you have, and how was it installed? > > NumPy version 1.3.0, installed locally by the built-in python routines. > Any reason for installing such an old version? I'm just curious. Does NumPy work properly? At the very least, if you run python does "import numpy" work or give an error? What happens if you try and do this: $ python >>> import numpy >>> numpy.get_include() '/usr/local/lib/python2.6/site-packages/numpy/core/include' (That's the output on one of our Linux machines) If that doesn't work, perhaps your PYTHONPATH needs setting. How/where did you install NumPy? e.g. python setup.py --prefix=$HOME >> What command did you use to attempt the install, and what >> error message did you get. > python setup.py --build > ==> ERROR MESSAGE > " > running build > running build_py > running build_ext > building 'Bio.Cluster.cluster' extension > gcc -pthread -fno-strict-aliasing -DNDEBUG -fomit-frame-pointer > -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector > -funwind-tables -fasynchronous-unwind-tables -g -fPIC > -I/usr/lib/python2.6/site-packages/numpy/core/include > -I/usr/include/python2.6 -c Bio/Cluster/clustermodule.c -o > build/temp.linux-i686-2.6/Bio/Cluster/clustermodule.o > Bio/Cluster/clustermodule.c:2:31: fatal error: numpy/arrayobject.h: No such > file or directory > compilation terminated. > error: command 'gcc' failed with exit status 1 > " OK, it isn't finding the numpy header files. I'd guess from your next email the file is /usr/lib/python2.6/site-packages/numpy/core/include/numpy/arrayobject.h The hack suggested in the installation document is to edit our setup.py file to point to the path explicitly. There is probably a more elegant way, right now my guess is that NumPy is not on the python path (see above). --- >From the test results, > python setup.py test > running test > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > Operating system: posix linux2 > test_Ace ... ok > ... > test_Entrez ... Segmentation fault (core dumped) Oh, nasty! That should *not* happen, and is probably a separate issue to the NumPy header install issue. Peter From mmokrejs at fold.natur.cuni.cz Tue May 3 19:20:13 2011 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Wed, 04 May 2011 01:20:13 +0200 Subject: [Biopython] How to optimize ACE file alignment (from newbler) In-Reply-To: References: <4DBFF38E.7050406@fold.natur.cuni.cz> Message-ID: <4DC08DAD.9000100@fold.natur.cuni.cz> Hi Peter, no I haven't played with gap5 yet, so far only with consed and tablet. Thanks for noting biopython has no write support for ACE. Martin Peter Cock wrote: > On Tue, May 3, 2011 at 1:22 PM, Martin Mokrejs > wrote: >> Hi, >> I would like to ask you how can I optimize the ACE alignment with files >> produced by newbler. I see only the high-quality region is aligned while >> the rest is not. I typically ask newbler to place into the ace files untrimmed >> reads so the low-quality sequence is present, you can see it could have been >> included in the alignment and contribute the consensus quite well. >> I found a new feature of consed-20 being able to re-align the reads >> but that seemed to be too slow for me and had to kill re-processing of one >> contig. >> Is there a way to direct some program that I want to re-align just some >> columns since some position? That should first align to the consensus already >> defined and afterwards continue with de novo alignment as long as it is possible. >> Alternatively, how do you edit ACE alignments (I mean manually adjust gaps, >> move columns back and forth, re-order rows) and do you re-calculate the >> consensus? >> This is some sort of a follow-up to "Newbler ACE file to SAM?" >> posted to biopython-developers list at http://web.archiveorange.com/archive/v/5dAwXxUKZDTmQdM80MqQ >> ;) >> Martin > > Hi Martin, > > Biopython only has an ACE parser, with no support for writing ACE files. > So, even if you did manipulate the parsed ACE file in Biopython, you'd > have to write your own output code (or use a simpler file format). > > Regarding assembly editors, have you looked at Gap4 or Gap5? > > This might be a good question to ask on the http://seqanswers.com > forum. > > Peter > > From Paul.Czodrowski at merck.de Wed May 4 04:47:14 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 10:47:14 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Peter, > > Dear Peter, > > > > > >> > > >> > Dear folks, > >> > > >> > I'm struggling around with the biopython installation. > >> > As non-administrator, the manual states the following: > >> > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > >> > > >> > However, the setup.py (version 1.57) does not contain any entry " > >> > include_dirs=["Bio/Cluster", "your_dir/include/python"] > >> > ", but rather only "Bio" entries. > >> > > >> > (See attached file: setup.py) > >> > >> You didn't really need to attach a whole file, you could have > >> linked to our repository or quoted the bit of interest. > > > > I'm sorry for this! > > Don't worry too much, its a fairly small file otherwise I wouldn't > have let it though the moderation queue. > > >> > Or do I oversee anything? > >> > >> What OS are you using? Some flavour of Linux? > > > > OpenSuse 11.3 > > Should be fine. > > >> > >> What version of NumPy do you have, and how was it installed? > > > > NumPy version 1.3.0, installed locally by the built-in python routines. > > > > Any reason for installing such an old version? I'm just curious. No logical reason... :) > > Does NumPy work properly? At the very least, if you run python > does "import numpy" work or give an error? What happens if you > try and do this: > > $ python > >>> import numpy > >>> numpy.get_include() > '/usr/local/lib/python2.6/site-packages/numpy/core/include' > > (That's the output on one of our Linux machines) We have the same output: >>> >>> >>> numpy.get_include() '/usr/lib/python2.6/site-packages/numpy/core/include' > > If that doesn't work, perhaps your PYTHONPATH needs setting. > How/where did you install NumPy? e.g. python setup.py --prefix=$HOME The /usr/lib python is installed via the yast OpenSuse. But it seems to me that this installation did not work properly, since there are only 2 files in the directory " /usr/lib/python2.6/site-packages/numpy/core/include/numpy/": - ufunc_api.txt - multiarray_api.txt However, we have another installation of NumPy which is located here: "/SW/python/lib/python2.6/site-packages/lib/python2.6/site-packages/numpy" And yes, there is a mix-up of the directories... :) > >> What command did you use to attempt the install, and what > >> error message did you get. > > python setup.py --build > > ==> ERROR MESSAGE > > " > > running build > > running build_py > > running build_ext > > building 'Bio.Cluster.cluster' extension > > gcc -pthread -fno-strict-aliasing -DNDEBUG -fomit-frame-pointer > > -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector > > -funwind-tables -fasynchronous-unwind-tables -g -fPIC > > -I/usr/lib/python2.6/site-packages/numpy/core/include > > -I/usr/include/python2.6 -c Bio/Cluster/clustermodule.c -o > > build/temp.linux-i686-2.6/Bio/Cluster/clustermodule.o > > Bio/Cluster/clustermodule.c:2:31: fatal error: numpy/arrayobject.h: No such > > file or directory > > compilation terminated. > > error: command 'gcc' failed with exit status 1 > > " > > OK, it isn't finding the numpy header files. I'd guess from your > next email the > file is /usr/lib/python2.6/site- > packages/numpy/core/include/numpy/arrayobject.h You are wrong about this. The header file is locate here: "/SW/python/lib/python2.6/site-packages/lib/python2.6/site-packages/numpy/core/include/numpy/" By appropiately setting the PYTHONPATH, it works properly. > > The hack suggested in the installation document is to edit our setup.py > file to point to the path explicitly. There is probably a more elegant way, > right now my guess is that NumPy is not on the python path (see above). > > --- > > >From the test results, > > > python setup.py test > > running test > > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > > Operating system: posix linux2 > > test_Ace ... ok > > ... > > test_Entrez ... Segmentation fault (core dumped) > > Oh, nasty! That should *not* happen, and is probably a separate > issue to the NumPy header install issue. python setup.py install --prefix=$HOME works fine now. Should the segmentation fault still be considered? Cheers & thanks, Paul > > Peter This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From p.j.a.cock at googlemail.com Wed May 4 05:06:11 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 May 2011 10:06:11 +0100 Subject: [Biopython] Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On Wed, May 4, 2011 at 9:47 AM, wrote: > Dear Peter, > >> >> Does NumPy work properly? At the very least, if you run python >> does "import numpy" work or give an error? What happens if you >> try and do this: >> >> $ python >> >>> import numpy >> >>> numpy.get_include() >> '/usr/local/lib/python2.6/site-packages/numpy/core/include' >> >> (That's the output on one of our Linux machines) > > We have the same output: >>>> >>>> >>>> numpy.get_include() > '/usr/lib/python2.6/site-packages/numpy/core/include' > > >> >> If that doesn't work, perhaps your PYTHONPATH needs setting. >> How/where did you install NumPy? e.g. python setup.py --prefix=$HOME > > The /usr/lib python is installed via the yast OpenSuse. > But it seems to me that this installation did not work properly, > since, there are only 2 files in the directory > " /usr/lib/python2.6/site-packages/numpy/core/include/numpy/": > - ufunc_api.txt > - multiarray_api.txt > > However, we have another installation of NumPy which is located here: > "/SW/python/lib/python2.6/site-packages/lib/python2.6/site-packages/numpy" > > And yes, there is a mix-up of the directories... :) I think that explains why the Biopython install didn't work originally, it found the broken NumPy under /usr/lib rather than your good one installed under /SW/ You might want to try and remove the broken NumPy, as it may cause you problems installing other python libraries. > > By appropiately setting the PYTHONPATH, it works properly. > OK, good. >> >From the test results, >> >> > python setup.py test >> > running test >> > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) >> > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] >> > Operating system: posix linux2 >> > test_Ace ... ok >> > ... >> > test_Entrez ... Segmentation fault (core dumped) >> >> Oh, nasty! That should *not* happen, and is probably a separate >> issue to the NumPy header install issue. > > python setup.py install --prefix=$HOME works fine now. > > Should the segmentation fault still be considered? Yes please. I assume it still breaks? Can you try changing to the Tests subdirectory from the Biopython source, and doing: python test_Entrez.py That should run just the Entrez tests, and hopefully give a bit more information about what/when the segmentation fault occurs. I suspect a problem in one of the Python C libraries that Biopython is using (since as far as I can recall, all the Bio.Entrez code is pure python). Peter From mictadlo at gmail.com Wed May 4 05:59:13 2011 From: mictadlo at gmail.com (Michal) Date: Wed, 04 May 2011 19:59:13 +1000 Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> References: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> Message-ID: <4DC12371.3040204@gmail.com> Hi Peter, Do you have the script which read https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml and what would be the correct output? Thank you in advance. Cheers, Michal On 05/03/2011 11:31 PM, Chris Fields wrote: > Haven't tried this using the latest BLAST+ myself, but it doesn't surprise me too much. Also agree re: some kind of bug tracking with NCBI; I believe they have an internal one, but it would be nice to have a public interface to it. > > chris > > On May 3, 2011, at 4:24 AM, Peter Cock wrote: > >> Hello all, >> >> I've CC'd the BioPerl, BioRuby, BioJava and Biopython development mailing >> lists to make sure you're aware of this, but can we continue any discussion >> on the cross-project open-bio-l mailing list please? >> >> I noticed that recent versions of BLAST are not using a single >> block for each query, which was the historical behaviour and assumed >> by the Biopython BLAST XML parser. This may be a bug in BLAST. >> See link below for an example. >> >> Has anyone else noticed this, and has it been reported to the NCBI yet? >> >> Thanks, >> >> Peter >> >> (Not for the first time, I wish there was a public bug tracker for BLAST, >> or at least a private bug tracker so we could talk about issues with an >> NCBI assigned reference number.) >> >> ---------- Forwarded message ---------- >> From: Peter Cock >> Date: Wed, Apr 20, 2011 at 6:08 PM >> Subject: Interesting BLAST 2.2.25+ XML behaviour >> To: Biopython-Dev Mailing List >> >> >> Hi all, >> >> Have a look at this XML file from a FASTA vs FASTA search >> using blastp from BLAST 2.2.25+ (current release), which >> is a test file I created for the BLAST+ wrappers in Galaxy: >> >> https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml >> >> I just put it though the Biopython BLAST XML parser, and >> was surprised not to get four records back (since as you >> might guess from the filename, there were four queries). >> >> It appears this version of BLAST+ is incrementing the >> iteration counter for each match... or something like that. >> >> Has anyone else noticed this? I wonder if it is accidental... >> >> Peter >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From p.j.a.cock at googlemail.com Wed May 4 06:36:57 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 May 2011 11:36:57 +0100 Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: <4DC12371.3040204@gmail.com> References: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> <4DC12371.3040204@gmail.com> Message-ID: On Wed, May 4, 2011 at 10:59 AM, Michal wrote: > Hi Peter, > Do you have the script which read > > https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml > > > and what would be the correct output? > > Thank you in advance. > > Cheers, > Michal Hi Michal, I'm not quite sure what you're asking, but I'll try. First, the three data files: $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/four_human_proteins.fasta $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/rhodopsin_proteins.fasta The query file has four sequences, $ grep -c "^>" four_human_proteins.fasta 4 $ grep "^>" four_human_proteins.fasta >sp|Q9BS26|ERP44_HUMAN Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1 >sp|Q9NSY1|BMP2K_HUMAN BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2 >sp|P06213|INSR_HUMAN Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4 >sp|P08100|OPSD_HUMAN Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1 Based on past experience, I would expect 4 iteration blocks in the XML, but in this case I have 24: $ grep "" -c blastp_four_human_vs_rhodopsin.xml 24 Notice we get 6 iterations for each query (4 times 6 is 24): $ grep "" blastp_four_human_vs_rhodopsin.xml sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN Now, using the two FASTA files directly and re-running blastp, what do I get? $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5 | grep "" -c 24 Or again with -parse_deflines, which changes how the hit ID/def is presented: $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5 -parse_deflines | grep "" -c 24 How about older versions? $ ~/Downloads/ncbi-blast-2.2.24+/bin/blastp -query four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5 BLAST engine error: XML formatting is only supported for a database search I'll have to make a blast database first... $ ~/Downloads/ncbi-blast-2.2.24+/bin/makeblastdb -in rhodopsin_proteins.fasta -dbtype prot Building a new DB, current time: 05/04/2011 11:22:57 New DB name: rhodopsin_proteins.fasta New DB title: rhodopsin_proteins.fasta Sequence type: Protein Keep Linkouts: T Keep MBits: T Maximum file size: 1073741824B Adding sequences from FASTA; added 6 sequences in 0.105655 seconds. $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query four_human_proteins.fasta -db rhodopsin_proteins.fasta -outfmt 5 | grep "" -c 4 Look - just four identifiers as I expect! This also works if the database is built with the -parse_seqids switch. The same happens with older versions of BLAST+, one block per query, so four iteration blocks for this example. I tried all of 2.2.21+, 2.2.22+, 2.2.23+ and 2.2.24+ (running makeblastdb to give a fresh database, then blastp). That seems to demonstrate that bug is specific to the XML output from FASTA vs FASTA (not FASTA vs DB), which is a new feature in NCBI BLAST 2.2.25+ I will raise this with the NCBI, and report back. However, even if the NCBI fix it in the next release, we (Bio*) may want to update our parsers to cope with this quirk, or at least put a warning in our BLAST XML parser documentation, as there will be lots of installations of NCBI BLAST 2.2.25+ in the wild. Peter From Paul.Czodrowski at merck.de Wed May 4 07:30:16 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 13:30:16 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Peter, > >> >From the test results, > >> > >> > python setup.py test > >> > running test > >> > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > >> > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > >> > Operating system: posix linux2 > >> > test_Ace ... ok > >> > ... > >> > test_Entrez ... Segmentation fault (core dumped) > >> > >> Oh, nasty! That should *not* happen, and is probably a separate > >> issue to the NumPy header install issue. > > > > python setup.py install --prefix=$HOME works fine now. > > > > Should the segmentation fault still be considered? > > Yes please. I assume it still breaks? Can you try changing to the > Tests subdirectory from the Biopython source, and doing: > > python test_Entrez.py I cannot find the src directory. Here is my Bio/ directory: " Affy Align AlignIO Alphabet Application Blast CAPS Clustalw Cluster Compass cpairwise2.so Crystal Data DocSQL.py DocSQL.pyc Emboss Entrez ExPASy File.py File.pyc FSSP GA GenBank Geo Graphics HMM HotRand.py HotRand.pyc Index.py Index.pyc __init__.py __init__.pyc InterPro KDTree KEGG kNN.py kNN.pyc LogisticRegression.py LogisticRegression.pyc MarkovModel.py MarkovModel.pyc MaxEntropy.py MaxEntropy.pyc Medline Motif NaiveBayes.py NaiveBayes.pyc NeuralNetwork Nexus NMR pairwise2.py pairwise2.pyc Parsers ParserSupport.py ParserSupport.pyc Pathway PDB Phylo PopGen _py3k.py _py3k.pyc Restriction SCOP Search.py Search.pyc SeqFeature.py SeqFeature.pyc SeqIO Seq.py Seq.pyc SeqRecord.py SeqRecord.pyc Sequencing SeqUtils Statistics SubsMat SVDSuperimposer SwissProt triefind.py triefind.pyc trie.so UniGene Wise " BTW, python setup.py install --prefix=$HOME did not break. Thanks & Ceers, Pau? > > That should run just the Entrez tests, and hopefully give a bit > more information about what/when the segmentation fault > occurs. I suspect a problem in one of the Python C libraries > that Biopython is using (since as far as I can recall, all the > Bio.Entrez code is pure python). > > Peter This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From anaryin at gmail.com Wed May 4 07:41:07 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 4 May 2011 13:41:07 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On the same level of Bio/ you have another directory called Tests/. If I list my biopython directory: joaor at home: ls biopython-git/ *Bio* BioSQL CONTRIB DEPRECATED Doc LICENSE MANIFEST.in NEWS README Scripts *Tests* build do2to3.py setup.py The file Peter was talking about should be there. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Wed, May 4, 2011 at 1:30 PM, wrote: > Dear Peter, > > > > > >> >From the test results, > > >> > > >> > python setup.py test > > >> > running test > > >> > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > > >> > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > > >> > Operating system: posix linux2 > > >> > test_Ace ... ok > > >> > ... > > >> > test_Entrez ... Segmentation fault (core dumped) > > >> > > >> Oh, nasty! That should *not* happen, and is probably a separate > > >> issue to the NumPy header install issue. > > > > > > python setup.py install --prefix=$HOME works fine now. > > > > > > Should the segmentation fault still be considered? > > > > Yes please. I assume it still breaks? Can you try changing to the > > Tests subdirectory from the Biopython source, and doing: > > > > python test_Entrez.py > > I cannot find the src directory. > Here is my Bio/ directory: > " > Affy > Align > AlignIO > Alphabet > Application > Blast > CAPS > Clustalw > Cluster > Compass > cpairwise2.so > Crystal > Data > DocSQL.py > DocSQL.pyc > Emboss > Entrez > ExPASy > File.py > File.pyc > FSSP > GA > GenBank > Geo > Graphics > HMM > HotRand.py > HotRand.pyc > Index.py > Index.pyc > __init__.py > __init__.pyc > InterPro > KDTree > KEGG > kNN.py > kNN.pyc > LogisticRegression.py > LogisticRegression.pyc > MarkovModel.py > MarkovModel.pyc > MaxEntropy.py > MaxEntropy.pyc > Medline > Motif > NaiveBayes.py > NaiveBayes.pyc > NeuralNetwork > Nexus > NMR > pairwise2.py > pairwise2.pyc > Parsers > ParserSupport.py > ParserSupport.pyc > Pathway > PDB > Phylo > PopGen > _py3k.py > _py3k.pyc > Restriction > SCOP > Search.py > Search.pyc > SeqFeature.py > SeqFeature.pyc > SeqIO > Seq.py > Seq.pyc > SeqRecord.py > SeqRecord.pyc > Sequencing > SeqUtils > Statistics > SubsMat > SVDSuperimposer > SwissProt > triefind.py > triefind.pyc > trie.so > UniGene > Wise > " > > BTW, python setup.py install --prefix=$HOME did not break. > > Thanks & Ceers, > Pau? > > > > > That should run just the Entrez tests, and hopefully give a bit > > more information about what/when the segmentation fault > > occurs. I suspect a problem in one of the Python C libraries > > that Biopython is using (since as far as I can recall, all the > > Bio.Entrez code is pure python). > > > > Peter > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Paul.Czodrowski at merck.de Wed May 4 08:40:06 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 14:40:06 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Joao & Peter, this is what I got: " Test error handling when presented with Fasta non-XML data ... ok Test error handling when presented with GenBank non-XML data ... ok Test parsing XML returned by EFetch, Nucleotide database (first test) ... ERROR Test parsing XML returned by EFetch, Protein database ... ERROR Test parsing XML returned by EFetch, OMIM database ... ERROR Test parsing XML returned by EFetch, PubMed database (first test) ... Segmentation fault (core dumped) " Cheers, Paul > On the same level of Bio/ you have another directory called Tests/. > > If I list my biopython directory: > > joaor at home: ls biopython-git/ > *Bio* BioSQL CONTRIB DEPRECATED Doc LICENSE > MANIFEST.in NEWS README Scripts *Tests* build > do2to3.py setup.py > > The file Peter was talking about should be there. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From p.j.a.cock at googlemail.com Wed May 4 09:17:21 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 May 2011 14:17:21 +0100 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On Wed, May 4, 2011 at 1:40 PM, wrote: > > Dear Joao & Peter, > > this is what I got: > > " > Test error handling when presented with Fasta non-XML data ... ok > Test error handling when presented with GenBank non-XML data ... ok > Test parsing XML returned by EFetch, Nucleotide database (first test) ... > ERROR > Test parsing XML returned by EFetch, Protein database ... ERROR > Test parsing XML returned by EFetch, OMIM database ... ERROR > Test parsing XML returned by EFetch, PubMed database (first test) ... > Segmentation fault (core dumped) > " > > > Cheers, > Paul Hmm, something amiss with the XML parsing I think, we're using the Python standard library xml.parsers.expat here. You said you were using OpenSuse 11.3, and the start of our test suite reported the following: Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] Operating system: posix linux2 What version of expat do you have? Try: $ python Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from xml.parsers import expat >>> print expat.__version__ $Revision: 17640 $ Do you fancy trying gdb to get a stack trace for us? I've had a quick Google, and the following issue *might* be related: http://bugs.python.org/issue4877 Peter From Paul.Czodrowski at merck.de Wed May 4 09:36:42 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 15:36:42 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: [Contact details redacted.] Peter Cock wrote on 04.05.2011 15:17:21: > On Wed, May 4, 2011 at 1:40 PM, wrote: > > > > Dear Joao & Peter, > > > > this is what I got: > > > > " > > Test error handling when presented with Fasta non-XML data ... ok > > Test error handling when presented with GenBank non-XML data ... ok > > Test parsing XML returned by EFetch, Nucleotide database (first test) ... > > ERROR > > Test parsing XML returned by EFetch, Protein database ... ERROR > > Test parsing XML returned by EFetch, OMIM database ... ERROR > > Test parsing XML returned by EFetch, PubMed database (first test) ... > > Segmentation fault (core dumped) > > " > > > > > > Cheers, > > Paul > > Hmm, something amiss with the XML parsing I think, we're > using the Python standard library xml.parsers.expat here. > > You said you were using OpenSuse 11.3, and the start of our test > suite reported the following: > > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > Operating system: posix linux2 > > What version of expat do you have? Try: > > $ python > Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) > [GCC 4.4.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from xml.parsers import expat > >>> print expat.__version__ > $Revision: 17640 $ $Revision: 1.1 $ > > Do you fancy trying gdb to get a stack trace for us? How shall I understand your question? Shall I use the gnu debugger in order to get some debuggable output? What is the worst case scenario related to biopython, i.e. could it ultimately lead to any errors/instabilities? Cheers, Paul > > I've had a quick Google, and the following issue *might* be > related: http://bugs.python.org/issue4877 > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From p.j.a.cock at googlemail.com Wed May 4 10:13:47 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 May 2011 15:13:47 +0100 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On Wed, May 4, 2011 at 2:36 PM, wrote: >> >> Do you fancy trying gdb to get a stack trace for us? > > How shall I understand your question? Shall I use the gnu debugger > in order to get some debuggable output? Yes please. With hindsight, "Could you try using the gnu debugger (gdb) to get a stack trace?" would have been clearer. Are you familiar with gdb? Was it the "Do you fancy *activity*?" phrasing that was unclear? Basically meaning "Would you like to do *activity*?". > What is the worst case scenario related to biopython, i.e. could it > ultimately lead to any errors/instabilities? It looks like if you tried to use Biopython's Bio.Entrez module to parse XML files from the NCBI it would crash. If you are not going to use that module, you should be fine. Peter From Paul.Czodrowski at merck.de Wed May 4 10:25:23 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 16:25:23 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: [Contact details redacted.] Peter Cock wrote on 04.05.2011 16:13:47: > On Wed, May 4, 2011 at 2:36 PM, wrote: > >> > >> Do you fancy trying gdb to get a stack trace for us? > > > > How shall I understand your question? Shall I use the gnu debugger > > in order to get some debuggable output? > > Yes please. > > With hindsight, "Could you try using the gnu debugger (gdb) to get > a stack trace?" would have been clearer. Are you familiar with gdb? > > Was it the "Do you fancy *activity*?" phrasing that was unclear? > Basically meaning "Would you like to do *activity*?". Yes, it was just the expression you used. I have to admit that English is not my mother tongue. > > > What is the worst case scenario related to biopython, i.e. could it > > ultimately lead to any errors/instabilities? > > It looks like if you tried to use Biopython's Bio.Entrez module to > parse XML files from the NCBI it would crash. If you are not going > to use that module, you should be fine. Good news, thanks :) And thanks for all the other help, also to JOAO!! Cheers, Paul > > Peter This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From Paul.Czodrowski at merck.de Tue May 10 03:50:23 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 10 May 2011 09:50:23 +0200 Subject: [Biopython] PDB parsing Message-ID: Dear folks, how do I add a B-factor as well as an occupancy column to a PDB file? I guess Bio.PDB is the appropriate module. But I already fail with regards to a simple PDB load... Cheers, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From anaryin at gmail.com Tue May 10 04:30:04 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 10 May 2011 10:30:04 +0200 Subject: [Biopython] PDB parsing In-Reply-To: References: Message-ID: Hey Paul, When you parse a PDB file with PDBParser it automatically retrieves both B-factor and occupancy. If it fails to do so for any reason, it defaults those values to 0. After parsing, you can set those values explicitly by modifying the corresponding attribute of the Atom object. So, for example, to change the B-factor of all your atoms to 10.0, you just have to do: for atom in structure.get_atoms(): > atom.bfactor = 10.0 > Hope this answered your question. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, May 10, 2011 at 9:50 AM, wrote: > > Dear folks, > > how do I add a B-factor as well as an occupancy column to a PDB file? > > I guess Bio.PDB is the appropriate module. > But I already fail with regards to a simple PDB load... > > > Cheers, > Paul > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Paul.Czodrowski at merck.de Tue May 10 05:19:54 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 10 May 2011 11:19:54 +0200 Subject: [Biopython] Antwort: Re: PDB parsing In-Reply-To: Message-ID: Dear Joao, this one does not work: " structure_id = "1234" PDBFILE = open(filename,'r').read() p = PDBParser(PERMISSIVE=1) p._parse(PDBFILE) pp = p.get_structure(structure_id, PDBFILE) for atom in pp.get_atoms(): atom.bfactor = 10.0 print atom.bfactor " "p.get_structure(structure_id, PDBFILE)" seems to get the structural data, but setting the bfactor does not give any output. Cheers & Thanks, Paul > Hey Paul, > > When you parse a PDB file with PDBParser it automatically retrieves both > B-factor and occupancy. If it fails to do so for any reason, it defaults > those values to 0. > > After parsing, you can set those values explicitly by modifying the > corresponding attribute of the Atom object. So, for example, to change the > B-factor of all your atoms to 10.0, you just have to do: > > for atom in structure.get_atoms(): > > atom.bfactor = 10.0 > > > > Hope this answered your question. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > Dear folks, > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > I guess Bio.PDB is the appropriate module. > > But I already fail with regards to a simple PDB load... > > > > > > Cheers, > > Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From anaryin at gmail.com Tue May 10 05:27:37 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 10 May 2011 11:27:37 +0200 Subject: [Biopython] Antwort: Re: PDB parsing In-Reply-To: References: Message-ID: Hey Paul, First of all, you should not call _parse on your own. That is called already when you call get_structure(). Generally, if a method has an underscore behind its name it means it shouldn't really be called unless you really know what you want to do with it. What version of Biopython are you using? I'd do this: structure_id = "1234" > PDBFILE = open(filename,'r') > p = PDBParser(PERMISSIVE=1) > pp = p.get_structure(structure_id, PDBFILE) > > for atom in pp.get_atoms(): > atom.bfactor = 10.0 > print atom.bfactor > It works pretty well here, with version 1.57. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, May 10, 2011 at 11:19 AM, wrote: > Dear Joao, > > this one does not work: > " > > structure_id = "1234" > PDBFILE = open(filename,'r').read() > p = PDBParser(PERMISSIVE=1) > p._parse(PDBFILE) > pp = p.get_structure(structure_id, PDBFILE) > > > for atom in pp.get_atoms(): > atom.bfactor = 10.0 > print atom.bfactor > " > > > "p.get_structure(structure_id, PDBFILE)" seems to get the structural data, > but setting the bfactor does not give any output. > > > > > Cheers & Thanks, > Paul > > > > Hey Paul, > > > > When you parse a PDB file with PDBParser it automatically retrieves both > > B-factor and occupancy. If it fails to do so for any reason, it defaults > > those values to 0. > > > > After parsing, you can set those values explicitly by modifying the > > corresponding attribute of the Atom object. So, for example, to change > the > > B-factor of all your atoms to 10.0, you just have to do: > > > > for atom in structure.get_atoms(): > > > atom.bfactor = 10.0 > > > > > > > Hope this answered your question. > > > > Cheers, > > > > Jo?o [...] Rodrigues > > http://nmr.chem.uu.nl/~joao > > > > > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > > > > Dear folks, > > > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > > > I guess Bio.PDB is the appropriate module. > > > But I already fail with regards to a simple PDB load... > > > > > > > > > Cheers, > > > Paul > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Paul.Czodrowski at merck.de Tue May 10 05:32:33 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 10 May 2011 11:32:33 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: PDB parsing In-Reply-To: Message-ID: Dear Jo?o, cool, thank you very much so far! How do I output the newly generated PDBfile? Cheers & thanks, Paul > Hey Paul, > > First of all, you should not call _parse on your own. That is called > already when you call get_structure(). Generally, if a method has an > underscore behind its name it means it shouldn't really be called > unless you really know what you want to do with it. > > What version of Biopython are you using? > > I'd do this: > structure_id = "1234" > PDBFILE = open(filename,'r') > p = PDBParser(PERMISSIVE=1) > pp = p.get_structure(structure_id, PDBFILE) > > for atom in pp.get_atoms(): > ?atom.bfactor = 10.0 > ?print atom.bfactor > > It works pretty well here, with version 1.57. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > On Tue, May 10, 2011 at 11:19 AM, wrote: > Dear Joao, > > this one does not work: > " > > structure_id = "1234" > PDBFILE = open(filename,'r').read() > p = PDBParser(PERMISSIVE=1) > p._parse(PDBFILE) > pp = p.get_structure(structure_id, PDBFILE) > > > for atom in pp.get_atoms(): > ?atom.bfactor = 10.0 > ?print atom.bfactor > " > > > "p.get_structure(structure_id, PDBFILE)" seems to get the structural data, > but setting the bfactor does not give any output. > > > > > Cheers & Thanks, > Paul > > > > Hey Paul, > > > > When you parse a PDB file with PDBParser it automatically retrieves both > > B-factor and occupancy. If it fails to do so for any reason, it defaults > > those values to 0. > > > > After parsing, you can set those values explicitly by modifying the > > corresponding attribute of the Atom object. So, for example, to change > the > > B-factor of all your atoms to 10.0, you just have to do: > > > > for atom in structure.get_atoms(): > > > ? atom.bfactor = 10.0 > > > > > > > Hope this answered your question. > > > > Cheers, > > > > Jo?o [...] Rodrigues > > http://nmr.chem.uu.nl/~joao > > > > > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > > > > Dear folks, > > > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > > > I guess Bio.PDB is the appropriate module. > > > But I already fail with regards to a simple PDB load... > > > > > > > > > Cheers, > > > Paul > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From anaryin at gmail.com Tue May 10 05:38:23 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 10 May 2011 11:38:23 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: PDB parsing In-Reply-To: References: Message-ID: Use PDBIO. from Bio.PDB import PDBIO IO = PDBIO() IO.set_structure(your_structure) IO.save(output_filename) You can also control which parts of the structure to output with Select. Check the documentation, it will make you progress much faster :) Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, May 10, 2011 at 11:32 AM, wrote: > Dear Jo?o, > > > cool, thank you very much so far! > > How do I output the newly generated PDBfile? > > Cheers & thanks, > Paul > > > > > Hey Paul, > > > > First of all, you should not call _parse on your own. That is called > > already when you call get_structure(). Generally, if a method has an > > underscore behind its name it means it shouldn't really be called > > unless you really know what you want to do with it. > > > > What version of Biopython are you using? > > > > I'd do this: > > > structure_id = "1234" > > PDBFILE = open(filename,'r') > > p = PDBParser(PERMISSIVE=1) > > pp = p.get_structure(structure_id, PDBFILE) > > > > for atom in pp.get_atoms(): > > atom.bfactor = 10.0 > > print atom.bfactor > > > > It works pretty well here, with version 1.57. > > > > Cheers, > > > > Jo?o [...] Rodrigues > > http://nmr.chem.uu.nl/~joao > > > > > > > On Tue, May 10, 2011 at 11:19 AM, wrote: > > Dear Joao, > > > > this one does not work: > > " > > > > structure_id = "1234" > > PDBFILE = open(filename,'r').read() > > p = PDBParser(PERMISSIVE=1) > > p._parse(PDBFILE) > > pp = p.get_structure(structure_id, PDBFILE) > > > > > > for atom in pp.get_atoms(): > > atom.bfactor = 10.0 > > print atom.bfactor > > " > > > > > > "p.get_structure(structure_id, PDBFILE)" seems to get the structural > data, > > but setting the bfactor does not give any output. > > > > > > > > > > Cheers & Thanks, > > Paul > > > > > > > Hey Paul, > > > > > > When you parse a PDB file with PDBParser it automatically retrieves > both > > > B-factor and occupancy. If it fails to do so for any reason, it > defaults > > > those values to 0. > > > > > > After parsing, you can set those values explicitly by modifying the > > > corresponding attribute of the Atom object. So, for example, to change > > the > > > B-factor of all your atoms to 10.0, you just have to do: > > > > > > for atom in structure.get_atoms(): > > > > atom.bfactor = 10.0 > > > > > > > > > > Hope this answered your question. > > > > > > Cheers, > > > > > > Jo?o [...] Rodrigues > > > http://nmr.chem.uu.nl/~joao > > > > > > > > > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > > > > > > > Dear folks, > > > > > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > > > > > I guess Bio.PDB is the appropriate module. > > > > But I already fail with regards to a simple PDB load... > > > > > > > > > > > > Cheers, > > > > Paul > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://disclaimer.merck.de to access the German, French, Spanish > and > > Portuguese versions of this disclaimer. > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Paul.Czodrowski at merck.de Tue May 10 07:05:50 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 10 May 2011 13:05:50 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: PDB parsing In-Reply-To: Message-ID: Dear Joao, thanks for your help and the documentation link! So far, I was aware of this documentation http://biopython.org/DIST/docs/tutorial/Tutorial.html wherein PDB parsing is only briefly covered. And, yes, progress is faster now! Cheers, Paul > Use PDBIO. > > from Bio.PDB import PDBIO > IO = PDBIO() > IO.set_structure(your_structure) > IO.save(output_filename) > > You can also control which parts of the structure to output with Select. > > Check the documentation org/DIST/docs/cookbook/biopdb_faq.pdf>, > it will make you progress much faster :) > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > On Tue, May 10, 2011 at 11:32 AM, wrote: > > > Dear Jo?o, > > > > > > cool, thank you very much so far! > > > > How do I output the newly generated PDBfile? > > > > Cheers & thanks, > > Paul > > > > > > > > > Hey Paul, > > > > > > First of all, you should not call _parse on your own. That is called > > > already when you call get_structure(). Generally, if a method has an > > > underscore behind its name it means it shouldn't really be called > > > unless you really know what you want to do with it. > > > > > > What version of Biopython are you using? > > > > > > I'd do this: > > > > > structure_id = "1234" > > > PDBFILE = open(filename,'r') > > > p = PDBParser(PERMISSIVE=1) > > > pp = p.get_structure(structure_id, PDBFILE) > > > > > > for atom in pp.get_atoms(): > > > atom.bfactor = 10.0 > > > print atom.bfactor > > > > > > It works pretty well here, with version 1.57. > > > > > > Cheers, > > > > > > Jo?o [...] Rodrigues > > > http://nmr.chem.uu.nl/~joao > > > > > > > > > > > On Tue, May 10, 2011 at 11:19 AM, wrote: > > > Dear Joao, > > > > > > this one does not work: > > > " > > > > > > structure_id = "1234" > > > PDBFILE = open(filename,'r').read() > > > p = PDBParser(PERMISSIVE=1) > > > p._parse(PDBFILE) > > > pp = p.get_structure(structure_id, PDBFILE) > > > > > > > > > for atom in pp.get_atoms(): > > > atom.bfactor = 10.0 > > > print atom.bfactor > > > " > > > > > > > > > "p.get_structure(structure_id, PDBFILE)" seems to get the structural > > data, > > > but setting the bfactor does not give any output. > > > > > > > > > > > > > > > Cheers & Thanks, > > > Paul > > > > > > > > > > Hey Paul, > > > > > > > > When you parse a PDB file with PDBParser it automatically retrieves > > both > > > > B-factor and occupancy. If it fails to do so for any reason, it > > defaults > > > > those values to 0. > > > > > > > > After parsing, you can set those values explicitly by modifying the > > > > corresponding attribute of the Atom object. So, for example, to change > > > the > > > > B-factor of all your atoms to 10.0, you just have to do: > > > > > > > > for atom in structure.get_atoms(): > > > > > atom.bfactor = 10.0 > > > > > > > > > > > > > Hope this answered your question. > > > > > > > > Cheers, > > > > > > > > Jo?o [...] Rodrigues > > > > http://nmr.chem.uu.nl/~joao > > > > > > > > > > > > > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > > > > > > > > > > Dear folks, > > > > > > > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > > > > > > > I guess Bio.PDB is the appropriate module. > > > > > But I already fail with regards to a simple PDB load... > > > > > > > > > > > > > > > Cheers, > > > > > Paul > > > > > > This message and any attachment are confidential and may be privileged or > > > otherwise protected from disclosure. If you are not the intended > > recipient, > > > you must not copy this message or attachment or disclose the contents to > > > any other person. If you have received this transmission in error, please > > > notify the sender immediately and delete the message and any attachment > > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not accept liability for any omissions or errors in this > > > message which may arise as a result of E-Mail-transmission or for damages > > > resulting from any unauthorized changes of the content of this message > > and > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not guarantee that this message is free of viruses and > > does > > > not accept liability for any damages caused by any virus transmitted > > > therewith. > > > > > > Click http://disclaimer.merck.de to access the German, French, Spanish > > and > > > Portuguese versions of this disclaimer. > > > > > > > > > _______________________________________________ > > > Biopython mailing list - Biopython at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://disclaimer.merck.de to access the German, French, Spanish and > > Portuguese versions of this disclaimer. > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From sainitin7 at gmail.com Thu May 12 04:39:28 2011 From: sainitin7 at gmail.com (sai nitin) Date: Thu, 12 May 2011 10:39:28 +0200 Subject: [Biopython] Problem in accessing pcassay database Message-ID: Hi all, I am new to Biopython i want to access pcassay database programatically the exact issue is described below --- I have list of Bioassay AIDs i want retrieve all Names i treid esummary to do this but it is giving error also tried to efetch but didnt succeed.. Can any body tell me possible solution... Thanks -- Sainitin D From p.j.a.cock at googlemail.com Thu May 12 05:15:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 May 2011 10:15:37 +0100 Subject: [Biopython] Problem in accessing pcassay database In-Reply-To: References: Message-ID: On Thu, May 12, 2011 at 9:39 AM, sai nitin wrote: > Hi all, > > I am new to Biopython i want to access pcassay database programatically the > exact issue is described below > > --- I have list of Bioassay AIDs i want retrieve all Names i treid esummary > to do this but it is giving error > also tried to efetch but didnt succeed.. > > Can any body tell me possible solution... > > Thanks Hi, Can you do this by hand? Which website would you use? If NCBI Entrez, then it should be possible using Biopython's Bio.Entrez module. Could you give an example, say two Bioassay AIDs, and the expected results (e.g. URLs to NCBI webpage). Peter From p.j.a.cock at googlemail.com Thu May 12 15:04:45 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 May 2011 20:04:45 +0100 Subject: [Biopython] Problem in accessing pcassay database In-Reply-To: References: Message-ID: Please CC the mailing list on any reply. On Thu, May 12, 2011 at 6:59 PM, sai nitin wrote: > Hi Peter, > Thanks for reply ya tried with Bio.entrez module (biopython) Ok let me > explain issue more clearly...Say i have AID as follows > 1. AID:?504582? i want to?retrieve Description section details from this URL > (http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=504582&loc=ea_ras) > Like this i have 20 -30 AIDs I want to do this for all of them > Any suggestions it would be gr8 help > Thanks, > Sainitin If you look on the page you linked to, notice AID 504582 is itself a link to Entrez, http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&db=pcassay&term=504582 So, I would expect an Entrez search for 504582 in the pcassay database to work. Trying this by hand on the NCBI Entrez website work fine, then from Biopython you could do the same search with Entrez.esearch(db="pcassay", term="504582") Peter From mictadlo at gmail.com Sun May 15 01:35:07 2011 From: mictadlo at gmail.com (Michal) Date: Sun, 15 May 2011 15:35:07 +1000 Subject: [Biopython] multiprocessing problem with pysam In-Reply-To: <20110412013119.GF2053@kunkel> References: <4DA1137E.1090803@gmail.com> <20110410111510.GA2634@kunkel> <4DA2EC9D.7040004@gmail.com> <20110412013119.GF2053@kunkel> Message-ID: <4DCF660B.30309@gmail.com> Hello, Thank you Brad. I have written the following new code: import re import os import pysam from pprint import pprint from multiprocessing import Pool class Test(): def __init__(self, bam_filename, cultivars): self.__bam_fh = pysam.Samfile(bam_filename, "rb") self.__cultivars = cultivars def run(self, ref_name): print os.getpid(), ref_name, self.__cultivars return (os.getpid(), ref_name) if __name__ == '__main__': cultivars = 'Ja,Ea,As'.replace(' ', '').split(',') bam_filename = "/media/usb/tests/test.bam" bamfile = pysam.Samfile(bam_filename, "rb") ref_names = bamfile.references ref_lengths = bamfile.lengths bamfile.close() # for ref_name in ref_names: # Test(bam_filename, cultivars).run(ref_names) pool = Pool() results = dict(pool.imap_unordered( Test(bam_filename, cultivars).run, ref_names)) pool.close() pool.join() pprint(results) and got the follwing error: Exception in thread Thread-2: Traceback (most recent call last): File "/home/mictadlo/apps/python/lib/python2.7/threading.py", line 530, in __bootstrap_inner self.run() File "/home/mictadlo/apps/python/lib/python2.7/threading.py", line 483, in run self.__target(*self.__args, **self.__kwargs) File "/home/mictadlo/apps/python/lib/python2.7/multiprocessing/pool.py", line 285, in _handle_tasks put(task) PicklingError: Can't pickle : attribute lookup __builtin__.instancemethod failed I have search and found two possible solution for this problem: * http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html * http://www.rueckstiess.net/research/snippets/show/ca1d7d90 However, is there a better way to solve it or the above solution are not good? Thank you in advance. Michal From chapmanb at 50mail.com Sun May 15 11:53:46 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 15 May 2011 11:53:46 -0400 Subject: [Biopython] multiprocessing problem with pysam In-Reply-To: <4DCF660B.30309@gmail.com> References: <4DA1137E.1090803@gmail.com> <20110410111510.GA2634@kunkel> <4DA2EC9D.7040004@gmail.com> <20110412013119.GF2053@kunkel> <4DCF660B.30309@gmail.com> Message-ID: <20110515155346.GD2530@kunkel> Michal; [multiprocessing] > class Test(): > def __init__(self, bam_filename, cultivars): > self.__bam_fh = pysam.Samfile(bam_filename, "rb") > self.__cultivars = cultivars > > def run(self, ref_name): > print os.getpid(), ref_name, self.__cultivars > return (os.getpid(), ref_name) [...] > pool = Pool() > results = dict(pool.imap_unordered( > Test(bam_filename, cultivars).run, ref_names)) [...] > and got the follwing error: > > Exception in thread Thread-2: [...] > PicklingError: Can't pickle : attribute > lookup __builtin__.instancemethod failed multiprocessing is sensitive to passing or calling complex class objects. My suggestion is to use functions without associated state attributes and pass in your information as standard python objects (strings, lists, dicts). I use a little decorator to make writing the functions passed easier: import functools def map_wrap(f): @functools.wraps(f) def wrapper(*args, **kwargs): return apply(f, *args, **kwargs) return wrapper Then would write your function as: @map_wrap def run_test(bam_filename, cultivars, ref_name): bam_fh = pysam.Samfile(bam_filename, "rb") print os.getpid(), ref_name, cultivars return (os.getpid(), ref_name) and call it with: cultivars = 'Ja,Ea,As'.replace(' ', '').split(',') bam_filename = "/media/usb/tests/test.bam" bamfile = pysam.Samfile(bam_filename, "rb") ref_names = bamfile.references bamfile.close() pool = Pool() results = dict(pool.imap(run_test, ((bam_filename, cultivars, ref) for ref in ref_names))) pool.close() Hope this helps, Brad From aradwen at gmail.com Wed May 18 11:28:25 2011 From: aradwen at gmail.com (Radhouane Aniba) Date: Wed, 18 May 2011 11:28:25 -0400 Subject: [Biopython] Snippets Sharing Message-ID: Hi guys, I apologize if that mail sounds like an ad, please consider it just like an annoucement. I just wanted you to be aware of the change that occured to biocoders.net We restructured it to be an online collaboration tool for bioinformatics, you could create groups for your projects, interact with other users, upload snippets and software packages that you find useful, discuss latest topics in bioinformatics, find newest jobs (we partner with simplyhired jobboard) and much more. I am not writing an extended mail so that you don't feel like spammed, it is not my goal. Just come an explore biocoders.net new formula. Cheers, Radhouane From p.j.a.cock at googlemail.com Wed May 18 16:42:02 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 May 2011 21:42:02 +0100 Subject: [Biopython] gff3 problem In-Reply-To: <20110408121041.GM20963@sobchak> References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> Message-ID: On Fri, Apr 8, 2011 at 1:10 PM, Brad Chapman wrote: > Leighton and Peter; > >> > Just to further complicate matters, the symbol convention for GFF3 differs >> > from Biopython in terms of the categories it defines: >> > + is positive strand >> > - is negative strand >> > . is not stranded (i.e. strand not relevant) >> > ? is strand relevant, but not known >> > http://www.sequenceontology.org/gff3.shtml > > Yes, although this strikes me a bit like fuzzy features in terms of > usefulness. > >> > The latter two are distinct, but not distinguished by convention in >> > Biopython: >> > The obvious (to me) mapping of the four allowed Biopython symbols to the >> > GFF3 convention is: >> > +1 -> + >> > -1 -> - >> > None -> . >> > 0 -> ? >> > because 'None' is semantically close to 'has no strand information of >> > consequence', and 0 is the mean of +1 and -1 ;) > > That's fine by me. Right now both '?' and '.' are converted to None > so I lose the subtle distinction GFF is introducing: > > strand_map = {'+' : 1, '-' : -1, '?' : None, None: None} > > If everyone agrees on that coding it's no problem to swap it over. > Brad So was the consensus that we should reword the Bio.SeqFeature docstring so say the four valid values for strand are (with GFF3 equivalents in brackets): +1 = Forward (+ in GFF3) -1 = Reverse (- in GFF3) 0 = Not stranded (. in GFF3) None = Unknown (? in GFF3) And should features on a protein sequence should then have strand 0? Peter From hxcan at stupidbeauty.com Thu May 19 01:00:37 2011 From: hxcan at stupidbeauty.com (=?GB2312?B?ssy78Mqk?=) Date: Thu, 19 May 2011 13:00:37 +0800 Subject: [Biopython] missing dtd file Message-ID: <4DD4A3F5.8020406@stupidbeauty.com> An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Thu May 19 03:57:17 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 19 May 2011 08:57:17 +0100 Subject: [Biopython] missing dtd file In-Reply-To: <4DD4A3F5.8020406@stupidbeauty.com> References: <4DD4A3F5.8020406@stupidbeauty.com> Message-ID: 2011/5/19 ?????? : > Hello > > > Entrez module gives this warning: > > /usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py:495: UserWarning: > Unable to load DTD file eLink_101123.dtd. > > Bio.Entrez uses NCBI's DTD files to parse XML files ... > > For this purpose, please download eLink_101123.dtd from > > http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_101123.dtd > > ... Thank you for alerting us, that file will be included in our next release. Could you update your copy of Biopython successfully? Peter From esa.aalto at oulu.fi Thu May 19 09:02:17 2011 From: esa.aalto at oulu.fi (Esa Aalto) Date: Thu, 19 May 2011 16:02:17 +0300 Subject: [Biopython] An error with Concatenate nexus Message-ID: <3C36433088B0FF4B834B351A67C98111E6F721@KEKO.univ.yo.oulu.fi> Dear group, I'm trying to concatenate 20 nexus files with the instructions given here: http://www.biopython.org/wiki/Concatenate_nexus but it doesn't work: Traceback (most recent call last): File "C:\Python27\concate_nexus.py", line 36, in nexi = [(handle.name, Nexus.Nexus(handle)) for handle in handles] File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 555, in __init__ self.read(input) File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 618, in read self._parse_nexus_block(title, contents) File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 659, in _parse_nexus_block getattr(self,'_'+line.command)(line.options) File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 1021, in _codonposset raise NexusError('Formatting Error in codonposset: %s ' % options) NexusError: Formatting Error in codonposset: * UNTITLED = 1: 1-577\3, 2: 2-578\3, 3: 3-579\3 The end of the first of my nex files looks like this: BEGIN SETS; TaxSet A_thaliana = 1; TaxSet A_lyrata = 2; TaxSet Boh = 3-32; TaxSet Ice = 33-60; TaxSet Ith = 61-92; TaxSet Kar = 93-124; TaxSet Lom = 125-156; TaxSet NC = 157-196; TaxSet Pl = 197-236; TaxSet Sp = 237-274; TaxSet Stu = 275-294; TaxSet South = 3-32 197-236; TaxSet North = 125-156 237-274; TaxSet lyrata = 2-294; END; BEGIN CODONS; CODONPOSSET * UNTITLED = 1: 1-577\3, 2: 2-578\3, 3: 3-579\3; CODESET * UNTITLED = Universal: all; END; BEGIN CODONUSAGE; END; BEGIN DnaSP; Genome= Diploid; ChromosomalLocation= Autosome; VariationType= DNA_Seq_Pol; Species= ---; ChromosomeName= ---; GenomicPosition= 1; GenomicAssembly= ---; DnaSPversion= Ver. 5.10.00; END; Could someone tell what's wrong here? Is it my nexus files or something in the code? Thanks for your help! Esa Aalto From cy at cymon.org Thu May 19 10:30:36 2011 From: cy at cymon.org (Cymon Cox) Date: Thu, 19 May 2011 15:30:36 +0100 Subject: [Biopython] An error with Concatenate nexus In-Reply-To: <3C36433088B0FF4B834B351A67C98111E6F721@KEKO.univ.yo.oulu.fi> References: <3C36433088B0FF4B834B351A67C98111E6F721@KEKO.univ.yo.oulu.fi> Message-ID: Hi Esa, At first glance this looks like a bug. But given that Nexus.combine() is going to discard your codonposset character partition anyway, you could try deleting it from the Nexus file before combining. Regards, Cymon On 19 May 2011 14:02, Esa Aalto wrote: > Dear group, > > I'm trying to concatenate 20 nexus files with the instructions given > here: > > http://www.biopython.org/wiki/Concatenate_nexus > > but it doesn't work: > > Traceback (most recent call last): > File "C:\Python27\concate_nexus.py", line 36, in > nexi = [(handle.name, Nexus.Nexus(handle)) for handle in handles] > File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 555, in > __init__ > self.read(input) > File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 618, in > read > self._parse_nexus_block(title, contents) > File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 659, in > _parse_nexus_block > getattr(self,'_'+line.command)(line.options) > File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 1021, in > _codonposset > raise NexusError('Formatting Error in codonposset: %s ' % options) > NexusError: Formatting Error in codonposset: * UNTITLED = 1: 1-577\3, 2: > 2-578\3, 3: 3-579\3 > > The end of the first of my nex files looks like this: > > BEGIN SETS; > TaxSet A_thaliana = 1; > TaxSet A_lyrata = 2; > TaxSet Boh = 3-32; > TaxSet Ice = 33-60; > TaxSet Ith = 61-92; > TaxSet Kar = 93-124; > TaxSet Lom = 125-156; > TaxSet NC = 157-196; > TaxSet Pl = 197-236; > TaxSet Sp = 237-274; > TaxSet Stu = 275-294; > TaxSet South = 3-32 197-236; > TaxSet North = 125-156 237-274; > TaxSet lyrata = 2-294; > END; > > BEGIN CODONS; > CODONPOSSET * UNTITLED = > 1: 1-577\3, > 2: 2-578\3, > 3: 3-579\3; > CODESET * UNTITLED = Universal: all; > END; > > BEGIN CODONUSAGE; > END; > > BEGIN DnaSP; > Genome= Diploid; > ChromosomalLocation= Autosome; > VariationType= DNA_Seq_Pol; > Species= ---; > ChromosomeName= ---; > GenomicPosition= 1; > GenomicAssembly= ---; > DnaSPversion= Ver. 5.10.00; > END; > > Could someone tell what's wrong here? Is it my nexus files or something > in the code? > > Thanks for your help! > > Esa Aalto > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- From fkelesh at gmail.com Fri May 20 05:33:03 2011 From: fkelesh at gmail.com (Fatih Keles) Date: Fri, 20 May 2011 12:33:03 +0300 Subject: [Biopython] installing biopython on mac os x 10.6 Message-ID: Hi, I was trying to install Biopython on mac os x 10.6 using X11. However, It gives this error : """ running install running build running build_py running build_ext building 'Bio.cpairwise2' extension gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -arch ppc -arch i386 -g -O2 -DNDEBUG -g -O3 -IBio -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.3-fat-2.7/Bio/cpairwise2module.o unable to execute gcc-4.0: No such file or directory error: command 'gcc-4.0' failed with exit status 1 """ I couldn't find the problem. I would be happy if you help me. Thanks, keles From p.j.a.cock at googlemail.com Fri May 20 05:40:16 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 20 May 2011 10:40:16 +0100 Subject: [Biopython] installing biopython on mac os x 10.6 In-Reply-To: References: Message-ID: On Fri, May 20, 2011 at 10:33 AM, Fatih Keles wrote: > Hi, > > I was trying to install Biopython on mac os x 10.6 using X11. However, > It gives this error : > """ > > running install > running build > running build_py > running build_ext > building 'Bio.cpairwise2' extension > gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -arch ppc -arch i386 > -g -O2 -DNDEBUG -g -O3 -IBio > -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.3-fat-2.7/Bio/cpairwise2module.o > unable to execute gcc-4.0: No such file or directory > error: command 'gcc-4.0' failed with exit status 1 > """ > > I couldn't find the problem. I would be happy if you help me. > > Thanks, > > keles Have you installed Apple X Code, the development suite that comes with Apple's version of gcc (C compiler)? What we say on the download page of the wiki is: >> For Mac OS X, we recommend installing from source (see below). >> You will need to have installed Apple's XCode tools including the >> optional 10.4 SDK (check the option for 10.4 support when >> installing Xcode tools). Peter From chapmanb at 50mail.com Fri May 20 07:15:35 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 20 May 2011 07:15:35 -0400 Subject: [Biopython] gff3 problem In-Reply-To: References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> Message-ID: <20110520111535.GC21651@sobchak> Peter; [SeqFeature support for not-stranded elements] > So was the consensus that we should reword the Bio.SeqFeature > docstring so say the four valid values for strand are (with GFF3 > equivalents in brackets): > > +1 = Forward (+ in GFF3) > -1 = Reverse (- in GFF3) > 0 = Not stranded (. in GFF3) > None = Unknown (? in GFF3) > > And should features on a protein sequence should then have strand 0? That sounds great. I can make the corresponding change to the GFF library. Let me know if there are any other roadblocks to integrating that. Thanks much, Brad From p.j.a.cock at googlemail.com Fri May 20 07:27:04 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 20 May 2011 12:27:04 +0100 Subject: [Biopython] gff3 problem In-Reply-To: <20110520111535.GC21651@sobchak> References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> <20110520111535.GC21651@sobchak> Message-ID: On Fri, May 20, 2011 at 12:15 PM, Brad Chapman wrote: > Peter; > > [SeqFeature support for not-stranded elements] >> So was the consensus that we should reword the Bio.SeqFeature >> docstring so say the four valid values for strand are (with GFF3 >> equivalents in brackets): >> >> +1 = Forward (+ in GFF3) >> -1 = Reverse (- in GFF3) >> 0 = Not stranded (. in GFF3) >> None = Unknown (? in GFF3) >> >> And should features on a protein sequence then have strand 0? > > That sounds great. I can make the corresponding change to the GFF > library. Let me know if there are any other roadblocks to > integrating that. Thanks much, > Brad I've remembered a corner case, mixed strand features. e.g the Arabidopsis thaliana chloroplast complete genome, AP000423 in EMBL, NC_000932 in GenBank (one of our unit test files). e.g. gene with join(complement(69611..69724),139856..140650) Clearly the child features have well defined strands (+1 and -1). The parent feature (the join) is mixed strand. Currently our GenBank parser uses None for this. So maybe: +1 = Forward (+ in GFF3) -1 = Reverse (- in GFF3) 0 = Not stranded (. in GFF3) None = Mixed or unknown (? in GFF3) Peter From cjfields at illinois.edu Fri May 20 09:24:30 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 May 2011 08:24:30 -0500 Subject: [Biopython] gff3 problem In-Reply-To: References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> <20110520111535.GC21651@sobchak> Message-ID: On May 20, 2011, at 6:27 AM, Peter Cock wrote: > On Fri, May 20, 2011 at 12:15 PM, Brad Chapman wrote: >> Peter; >> >> [SeqFeature support for not-stranded elements] >>> So was the consensus that we should reword the Bio.SeqFeature >>> docstring so say the four valid values for strand are (with GFF3 >>> equivalents in brackets): >>> >>> +1 = Forward (+ in GFF3) >>> -1 = Reverse (- in GFF3) >>> 0 = Not stranded (. in GFF3) >>> None = Unknown (? in GFF3) >>> >>> And should features on a protein sequence then have strand 0? >> >> That sounds great. I can make the corresponding change to the GFF >> library. Let me know if there are any other roadblocks to >> integrating that. Thanks much, >> Brad > > I've remembered a corner case, mixed strand features. e.g the > Arabidopsis thaliana chloroplast complete genome, AP000423 > in EMBL, NC_000932 in GenBank (one of our unit test files). > e.g. gene with join(complement(69611..69724),139856..140650) > > Clearly the child features have well defined strands (+1 and -1). > The parent feature (the join) is mixed strand. Currently our > GenBank parser uses None for this. So maybe: > > +1 = Forward (+ in GFF3) > -1 = Reverse (- in GFF3) > 0 = Not stranded (. in GFF3) > None = Mixed or unknown (? in GFF3) > > Peter That's essentially what bioperl does for 'split' locations (actually, I think it is just undef, which would translate to '?' for GFF3). chris From laserson at mit.edu Fri May 20 17:14:32 2011 From: laserson at mit.edu (Uri Laserson) Date: Fri, 20 May 2011 17:14:32 -0400 Subject: [Biopython] Serialize SeqRecord to JSON? Message-ID: Does anyone know of a solution for this? Thanks! Uri ................................................................................... Uri Laserson Graduate Student, Biomedical Engineering Harvard-MIT Division of Health Sciences and Technology M +1 917 742 8019 laserson at mit.edu From mjldehoon at yahoo.com Fri May 20 23:59:24 2011 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 May 2011 20:59:24 -0700 (PDT) Subject: [Biopython] installing biopython on mac os x 10.6 In-Reply-To: Message-ID: <782468.28393.qm@web161211.mail.bf1.yahoo.com> Probably you don't have a C compiler installed on your computer. The easiest way to get one is to install Apple's Xcode package. --Michiel. --- On Fri, 5/20/11, Fatih Keles wrote: > From: Fatih Keles > Subject: [Biopython] installing biopython on mac os x 10.6 > To: biopython at lists.open-bio.org > Date: Friday, May 20, 2011, 5:33 AM > Hi, > > I was trying to install Biopython on mac os x 10.6 using > X11. However, > It gives this error : > """ > > running install > running build > running build_py > running build_ext > building 'Bio.cpairwise2' extension > gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -arch ppc > -arch i386 > -g -O2 -DNDEBUG -g -O3 -IBio > -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.3-fat-2.7/Bio/cpairwise2module.o > unable to execute gcc-4.0: No such file or directory > error: command 'gcc-4.0' failed with exit status 1 > """ > > I couldn't find the problem. I would be happy if you help > me. > > Thanks, > > keles > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From sainitin7 at gmail.com Mon May 23 04:32:07 2011 From: sainitin7 at gmail.com (sai nitin) Date: Mon, 23 May 2011 10:32:07 +0200 Subject: [Biopython] Problem to retreive compound names using CID from PubChem Message-ID: Hi all, Myself sainitin i have list of CIDs from Pubchem Database i want retereive corresponding compundnames to automate this process im using Biopython Entrez module (Entrez.esummary) when i give one CID and try to retreive name of the compound error is occuring Code h = Entrez.esummary(db = "pccompound",id = "449489") r = Entrez.read(h) r[0]["SourceName"] Error Traceback (most recent call last): File "", line 1, in KeyError: 'SourceName' Can anybody help me to solve this Thanks -- Sainitin D From fkauff at biologie.uni-kl.de Mon May 23 06:19:30 2011 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Mon, 23 May 2011 12:19:30 +0200 Subject: [Biopython] An error with Concatenate nexus In-Reply-To: References: <3C36433088B0FF4B834B351A67C98111E6F721@KEKO.univ.yo.oulu.fi> Message-ID: <4DDA34B2.9010907@biologie.uni-kl.de> Hi Esa, are you using an up-to-date Nexus parser? The codonposset below can be read without problems when I copy-paste it into one of my nexus files. Or, if you like, send me a copy of your complete nexus file for a check. Cheers, Frank On 05/19/2011 04:30 PM, Cymon Cox wrote: > Hi Esa, > > At first glance this looks like a bug. > > But given that Nexus.combine() is going to discard your codonposset > character partition anyway, you could try deleting it from the Nexus file > before combining. > > Regards, Cymon > > On 19 May 2011 14:02, Esa Aalto wrote: > >> Dear group, >> >> I'm trying to concatenate 20 nexus files with the instructions given >> here: >> >> http://www.biopython.org/wiki/Concatenate_nexus >> >> but it doesn't work: >> >> Traceback (most recent call last): >> File "C:\Python27\concate_nexus.py", line 36, in >> nexi = [(handle.name, Nexus.Nexus(handle)) for handle in handles] >> File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 555, in >> __init__ >> self.read(input) >> File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 618, in >> read >> self._parse_nexus_block(title, contents) >> File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 659, in >> _parse_nexus_block >> getattr(self,'_'+line.command)(line.options) >> File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 1021, in >> _codonposset >> raise NexusError('Formatting Error in codonposset: %s ' % options) >> NexusError: Formatting Error in codonposset: * UNTITLED = 1: 1-577\3, 2: >> 2-578\3, 3: 3-579\3 >> >> The end of the first of my nex files looks like this: >> >> BEGIN SETS; >> TaxSet A_thaliana = 1; >> TaxSet A_lyrata = 2; >> TaxSet Boh = 3-32; >> TaxSet Ice = 33-60; >> TaxSet Ith = 61-92; >> TaxSet Kar = 93-124; >> TaxSet Lom = 125-156; >> TaxSet NC = 157-196; >> TaxSet Pl = 197-236; >> TaxSet Sp = 237-274; >> TaxSet Stu = 275-294; >> TaxSet South = 3-32 197-236; >> TaxSet North = 125-156 237-274; >> TaxSet lyrata = 2-294; >> END; >> >> BEGIN CODONS; >> CODONPOSSET * UNTITLED = >> 1: 1-577\3, >> 2: 2-578\3, >> 3: 3-579\3; >> CODESET * UNTITLED = Universal: all; >> END; >> >> BEGIN CODONUSAGE; >> END; >> >> BEGIN DnaSP; >> Genome= Diploid; >> ChromosomalLocation= Autosome; >> VariationType= DNA_Seq_Pol; >> Species= ---; >> ChromosomeName= ---; >> GenomicPosition= 1; >> GenomicAssembly= ---; >> DnaSPversion= Ver. 5.10.00; >> END; >> >> Could someone tell what's wrong here? Is it my nexus files or something >> in the code? >> >> Thanks for your help! >> >> Esa Aalto >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > -- > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From chapmanb at 50mail.com Mon May 23 06:42:56 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 23 May 2011 06:42:56 -0400 Subject: [Biopython] Problem to retreive compound names using CID from PubChem In-Reply-To: References: Message-ID: <20110523104256.GA2365@kunkel> Sainitin; > Code > h = Entrez.esummary(db = "pccompound",id = "449489") > r = Entrez.read(h) > r[0]["SourceName"] > > Error > Traceback (most recent call last): > File "", line 1, in > KeyError: 'SourceName' > > Can anybody help me to solve this The 'r' object you've parsed from Entrez contains a list of dictionaries. The information that is in each dictionary will be dependent on the database you are retrieving from. In this case there is no SourceName information, so python returns a KeyError to indicate this. You can examine the items in the dictionary with: for key, val in r[0].iteritems(): print key, val [...] InChI InChI=1S/C9H12IN2O8P/c10-4-2-12(9(15)11-8(4)14)7-1-5(13)6(20-7)3-19-21(16,17)18/h2,5-7,13H,1,3H2,(H,11,14,15)(H2,16,17,18)/t5-,6+,7+/m0/s1 TautomerCount 3 SourceIDList [] BondChiralCount 0 MeSHTermList ["5-iodo-2'-deoxyuridine 5'-monophosphate", '5-iodo-dUMP', 'IdUMP', 'iododeoxyuridylate', 'iododeoxyuridylate, 125I-labeled'] [...] There are also a number of good online resources for learning Python which will help give experience in debugging these kind of errors: http://learnpythonthehardway.org/index http://diveintopython.org/ Hope this helps, Brad From p.j.a.cock at googlemail.com Mon May 23 07:01:51 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 May 2011 12:01:51 +0100 Subject: [Biopython] Serialize SeqRecord to JSON? In-Reply-To: References: Message-ID: On Fri, May 20, 2011 at 10:14 PM, Uri Laserson wrote: > Does anyone know of a solution for this? > > Thanks! > Uri I thought JSON was more suited to holding simple data structures, rather than serialising arbitrary complex objects. Which bits of data do you need? The basics like the id/name/description and sequence could be presented like a tuple and encoded in JSON. Annotations begins to get complicated - but a dictionary of basic types should be fine. I suspect the biggest hurdle would be trying to encode any features. Peter From sdavis2 at mail.nih.gov Mon May 23 14:08:47 2011 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 23 May 2011 14:08:47 -0400 Subject: [Biopython] [OT] Bioconductor-2011 conference. Message-ID: All, Sorry for the slightly off-topic post, but I know there are some overlaps between Bioconductor and Biopython user groups. The Bioconductor-2011 conference will be held July 28-29, 2011 (optional: July 27 - Developer Day) at the Fred Hutchinson Cancer Research Center in Seattle, WA. This conference highlights current developments within and beyond?Bioconductor, an international open source and open development software project for the analysis and comprehension of high-throughput genomic data. ?The conference provides a forum in which to discuss the use and design of software for analyzing data arising in biology with a focus on Bioconductor and genomic data. If interested, see the website: https://secure.bioconductor.org/BioC2011/ Thanks, Sean From laserson at mit.edu Mon May 23 15:42:35 2011 From: laserson at mit.edu (Uri Laserson) Date: Mon, 23 May 2011 15:42:35 -0400 Subject: [Biopython] reading Alphabet from file Message-ID: Hi all, I am trying to implement a method that will convert a SeqRecord to a JSON serializable object. One piece of data that must be stored for a Seq object is the alphabet type. When I read this from file, what is the best practice to reload a the same alphabet type? Thanks! Uri ................................................................................... Uri Laserson Graduate Student, Biomedical Engineering Harvard-MIT Division of Health Sciences and Technology M +1 917 742 8019 laserson at mit.edu From p.j.a.cock at googlemail.com Mon May 23 18:09:02 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 May 2011 23:09:02 +0100 Subject: [Biopython] reading Alphabet from file In-Reply-To: References: Message-ID: On Monday, May 23, 2011, Uri Laserson wrote: > Hi all, > > I am trying to implement a method that will convert a SeqRecord to a JSON > serializable object. ?One piece of data that must be stored for a Seq object > is the alphabet type. ?When I read this from file, what is the best practice > to reload a the same alphabet type? > > Thanks! > Uri Hmm, that's tricky because the Biopython alphabet haerachy is so complicated. Or richly detailed depending on your point of view ;-) In your position I would apply the KISS principle and reduce it to Protein, DNA, RNA or unknown - and use the generic_protein etc classes on reconstruction. Unless you need more detail than that? Peter From p.j.a.cock at googlemail.com Tue May 24 07:26:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 24 May 2011 12:26:25 +0100 Subject: [Biopython] gff3 problem In-Reply-To: References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> <20110520111535.GC21651@sobchak> Message-ID: On Fri, May 20, 2011 at 12:27 PM, Peter Cock wrote: > On Fri, May 20, 2011 at 12:15 PM, Brad Chapman wrote: >> Peter; >> >> [SeqFeature support for not-stranded elements] >>> So was the consensus that we should reword the Bio.SeqFeature >>> docstring so say the four valid values for strand are (with GFF3 >>> equivalents in brackets): >>> >>> +1 = Forward (+ in GFF3) >>> -1 = Reverse (- in GFF3) >>> 0 = Not stranded (. in GFF3) >>> None = Unknown (? in GFF3) >>> >>> And should features on a protein sequence then have strand 0? >> >> That sounds great. I can make the corresponding change to the >> GFF library. Let me know if there are any other roadblocks to >> integrating that. Thanks much, >> Brad Going over this a fresh now, in my email of 20 May, I had mixed up Leighton's original suggestion. The two special cases (0 and None) are a bit of a pain: http://lists.open-bio.org/pipermail/biopython/2011-April/007194.html Back in April, Leighton wrote: > The obvious (to me) mapping of the four allowed Biopython symbols to the > GFF3 convention is: > +1 -> + > -1 -> - > None -> . > 0 -> ? > because 'None' is semantically close to 'has no strand information of > consequence', and 0 is the mean of +1 and -1 ;) > Cheers, > L. i.e. +1 = Forward (+ in GFF3) -1 = Reverse (- in GFF3) 0 = Stranded but unknown (? in GFF3) None = Not stranded (. in GFF3) SeqFeature docstring updated: https://github.com/biopython/biopython/commit/ea64c74758dccfc7e6c0940e31a214293ecc59d3 This way proteins features should have strand None (which is what the current GenBank/EMBL parser does anyway). Note that the SeqFeature default is strand=None which is still OK. Mixed strand isn't needed in the GFF3 model, but we already use None for this. Perhaps it should be 0 rather than None under this model? Peter From hxcan at stupidbeauty.com Sun May 29 03:18:22 2011 From: hxcan at stupidbeauty.com (=?GB2312?B?ssy78Mqk?=) Date: Sun, 29 May 2011 15:18:22 +0800 Subject: [Biopython] Another warning of "missing dtd file" Message-ID: <4DE1F33E.8020700@stupidbeauty.com> /usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py:495: UserWarning: Unable to load DTD file bookdoc_110101.dtd. Bio.Entrez uses NCBI's DTD files to parse XML files returned by NCBI Entrez. Though most of NCBI's DTD files are included in the Biopython distribution, sometimes you may find that a particular DTD file is missing. While we can access the DTD file through the internet, the parser is much faster if the required DTD files are available locally. For this purpose, please download bookdoc_110101.dtd from http://www.ncbi.nlm.nih.gov/entrez/query/DTD/bookdoc_110101.dtd and save it either in directory /usr/lib/python2.6/site-packages/Bio/Entrez/DTDs or in directory /Data/.biopython/Bio/Entrez/DTDs in order for Bio.Entrez to find it. Alternatively, you can save bookdoc_110101.dtd in the directory Bio/Entrez/DTDs in the Biopython distribution, and reinstall Biopython. Please also inform the Biopython developers about this missing DTD, by reporting a bug on http://bugzilla.open-bio.org/ or sign up to our mailing list and emailing us, so that we can include it with the next release of Biopython. Proceeding to access the DTD file through the internet... warnings.warn(message) From p.j.a.cock at googlemail.com Sun May 29 06:00:58 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 29 May 2011 11:00:58 +0100 Subject: [Biopython] Another warning of "missing dtd file" In-Reply-To: <4DE1F33E.8020700@stupidbeauty.com> References: <4DE1F33E.8020700@stupidbeauty.com> Message-ID: 2011/5/29 ?????? : > /usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py:495: UserWarning: > Unable to load DTD file bookdoc_110101.dtd. > ,,, > For this purpose, please download bookdoc_110101.dtd from > > http://www.ncbi.nlm.nih.gov/entrez/query/DTD/bookdoc_110101.dtd > > ... > Please also inform the Biopython developers about this missing DTD, by > reporting a bug on http://bugzilla.open-bio.org/ or sign up to our mailing > list and emailing us, so that we can include it with the next release of > Biopython. Thank you, that's been added. I don't see anything else missing from this list, but I know it is a partial listing: http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/index.shtml Peter From sainitin7 at gmail.com Tue May 31 07:34:54 2011 From: sainitin7 at gmail.com (sai nitin) Date: Tue, 31 May 2011 13:34:54 +0200 Subject: [Biopython] Query regarding Bioassay database Message-ID: Hello, Myself sainitin i have one query regarding Eutilities use for pubchem and bioassay database as follows Question: I have list of pubchem IDs i have to get corresponding bioassay IDS which are unspecified for example it should print as following PubchemID:Bioassay IDs (unspecified) Please can any one give some suggestions how to retreive unspecified Bioassay IDS for given Pubchem IDS using Biopython Thanks in Advance -- Sainitin D From p.j.a.cock at googlemail.com Tue May 31 08:30:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 31 May 2011 13:30:15 +0100 Subject: [Biopython] Query regarding Bioassay database In-Reply-To: References: Message-ID: On Tue, May 31, 2011 at 12:34 PM, sai nitin wrote: > Hello, > > Myself sainitin ?i have one query regarding Eutilities use for pubchem and > bioassay database as follows > > Question: I have list of pubchem IDs i have to get corresponding bioassay > IDS which are unspecified > for example it should print as following > > PubchemID:Bioassay IDs (unspecified) > > Please can any one give some suggestions how to retreive unspecified > Bioassay IDS for given Pubchem IDS using Biopython > > Thanks in Advance Try Entrez Link (ELink), possibly with the pcassay_pccompound link. See the links in the Biopython documentation for Bio.Entrez.ELink, especially: http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/entrezlinks.html If you could give a more complete example it would help. In particular, an example of a positive match between pubchem and bioassay. Peter From Paul.Czodrowski at merck.de Tue May 3 10:56:10 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 3 May 2011 12:56:10 +0200 Subject: [Biopython] installation as non-administrator Message-ID: Dear folks, I'm struggling around with the biopython installation. As non-administrator, the manual states the following: http://biopython.org/DIST/docs/install/Installation.html#htoc30 However, the setup.py (version 1.57) does not contain any entry " include_dirs=["Bio/Cluster", "your_dir/include/python"] ", but rather only "Bio" entries. (See attached file: setup.py) Or do I oversee anything? Regards, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. -------------- next part -------------- A non-text attachment was scrubbed... Name: setup.py Type: application/octet-stream Size: 11597 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Tue May 3 11:31:31 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 May 2011 12:31:31 +0100 Subject: [Biopython] installation as non-administrator In-Reply-To: References: Message-ID: On Tue, May 3, 2011 at 11:56 AM, wrote: > > Dear folks, > > I'm struggling around with the biopython installation. > As non-administrator, the manual states the following: > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > > However, the setup.py (version 1.57) does not contain any entry " > include_dirs=["Bio/Cluster", "your_dir/include/python"] > ", but rather only "Bio" entries. > > (See attached file: setup.py) You didn't really need to attach a whole file, you could have linked to our repository or quoted the bit of interest. > Or do I oversee anything? What OS are you using? Some flavour of Linux? What version of NumPy do you have, and how was it installed? What command did you use to attempt the install, and what error message did you get. Have you tried the --prefix argument? e.g. python setup.py build python setup.py test python setup.py install --prefix=$HOME Peter From anaryin at gmail.com Tue May 3 11:32:05 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 3 May 2011 13:32:05 +0200 Subject: [Biopython] installation as non-administrator In-Reply-To: References: Message-ID: Hey Paul, I usually keep a copy of biopython in my home directory either by supplying the keyword --home=/my/home/directory or just by making "python setup.py build" and then adding the temp/libxxx/ directory to my PYTHONPATH. Hope it helps, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, May 3, 2011 at 12:56 PM, wrote: > > Dear folks, > > I'm struggling around with the biopython installation. > As non-administrator, the manual states the following: > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > > However, the setup.py (version 1.57) does not contain any entry " > include_dirs=["Bio/Cluster", "your_dir/include/python"] > ", but rather only "Bio" entries. > > (See attached file: setup.py) > > Or do I oversee anything? > > > Regards, > Paul > > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From anaryin at gmail.com Tue May 3 11:32:47 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 3 May 2011 13:32:47 +0200 Subject: [Biopython] installation as non-administrator In-Reply-To: References: Message-ID: Sorry, --prefix, not --home. From mmokrejs at fold.natur.cuni.cz Tue May 3 12:22:38 2011 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Tue, 03 May 2011 14:22:38 +0200 Subject: [Biopython] How to optimize ACE file alignment (from newbler) Message-ID: <4DBFF38E.7050406@fold.natur.cuni.cz> Hi, I would like to ask you how can I optimize the ACE alignment with files produced by newbler. I see only the high-quality region is aligned while the rest is not. I typically ask newbler to place into the ace files untrimmed reads so the low-quality sequence is present, you can see it could have been included in the alignment and contribute the consensus quite well. I found a new feature of consed-20 being able to re-align the reads but that seemed to be too slow for me and had to kill re-processing of one contig. Is there a way to direct some program that I want to re-align just some columns since some position? That should first align to the consensus already defined and afterwards continue with de novo alignment as long as it is possible. Alternatively, how do you edit ACE alignments (I mean manually adjust gaps, move columns back and forth, re-order rows) and do you re-calculate the consensus? This is some sort of a follow-up to "Newbler ACE file to SAM?" posted to biopython-developers list at http://web.archiveorange.com/archive/v/5dAwXxUKZDTmQdM80MqQ ;) Martin From p.j.a.cock at googlemail.com Tue May 3 13:46:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 May 2011 14:46:25 +0100 Subject: [Biopython] How to optimize ACE file alignment (from newbler) In-Reply-To: <4DBFF38E.7050406@fold.natur.cuni.cz> References: <4DBFF38E.7050406@fold.natur.cuni.cz> Message-ID: On Tue, May 3, 2011 at 1:22 PM, Martin Mokrejs wrote: > Hi, > ?I would like to ask you how can I optimize the ACE alignment with files > produced by newbler. I see only the high-quality region is aligned while > the rest is not. I typically ask newbler to place into the ace files untrimmed > reads so the low-quality sequence is present, you can see it could have been > included in the alignment and contribute the consensus quite well. > ?I found a new feature of consed-20 being able to re-align the reads > but that seemed to be too slow for me and had to kill re-processing of one > contig. > ?Is there a way to direct some program that I want to re-align just some > columns since some position? That should first align to the consensus already > defined and afterwards continue with de novo alignment as long as it is possible. > ?Alternatively, how do you edit ACE alignments (I mean manually adjust gaps, > move columns back and forth, re-order rows) and do you re-calculate the > consensus? > ?This is some sort of a follow-up to "Newbler ACE file to SAM?" > posted to biopython-developers list at http://web.archiveorange.com/archive/v/5dAwXxUKZDTmQdM80MqQ > ;) > Martin Hi Martin, Biopython only has an ACE parser, with no support for writing ACE files. So, even if you did manipulate the parsed ACE file in Biopython, you'd have to write your own output code (or use a simpler file format). Regarding assembly editors, have you looked at Gap4 or Gap5? This might be a good question to ask on the http://seqanswers.com forum. Peter From Paul.Czodrowski at merck.de Tue May 3 14:38:25 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 3 May 2011 16:38:25 +0200 Subject: [Biopython] Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Peter, > > > > Dear folks, > > > > I'm struggling around with the biopython installation. > > As non-administrator, the manual states the following: > > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > > > > However, the setup.py (version 1.57) does not contain any entry " > > include_dirs=["Bio/Cluster", "your_dir/include/python"] > > ", but rather only "Bio" entries. > > > > (See attached file: setup.py) > > You didn't really need to attach a whole file, you could have > linked to our repository or quoted the bit of interest. I'm sorry for this! > > > Or do I oversee anything? > > What OS are you using? Some flavour of Linux? OpenSuse 11.3 > > What version of NumPy do you have, and how was it installed? NumPy version 1.3.0, installed locally by the built-in python routines. > > What command did you use to attempt the install, and what > error message did you get. python setup.py --build ==> ERROR MESSAGE " running build running build_py running build_ext building 'Bio.Cluster.cluster' extension gcc -pthread -fno-strict-aliasing -DNDEBUG -fomit-frame-pointer -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fPIC -I/usr/lib/python2.6/site-packages/numpy/core/include -I/usr/include/python2.6 -c Bio/Cluster/clustermodule.c -o build/temp.linux-i686-2.6/Bio/Cluster/clustermodule.o Bio/Cluster/clustermodule.c:2:31: fatal error: numpy/arrayobject.h: No such file or directory compilation terminated. error: command 'gcc' failed with exit status 1 " > > Have you tried the --prefix argument? > > e.g. > > python setup.py build > python setup.py test > python setup.py install --prefix=$HOME > > Peter python setup.py --test ==> ERROR MESSAGE " python setup.py test running test Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] Operating system: posix linux2 test_Ace ... ok test_AlignIO ... ok test_AlignIO_convert ... ok test_BioSQL ... /xyz: UserWarning: order location operators are not fully supported % feature.location_operator) ok test_BioSQL_SeqIO ... ERROR test_CAPS ... ok test_Clustalw ... ok test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use Bio.Clustalw. test_Cluster ... skipping. If you want to use Bio.Cluster, install NumPy first and then reinstall Biopython test_CodonTable ... ok test_CodonUsage ... ok test_Compass ... ok test_Crystal ... ok test_Dialign_tool ... skipping. Install DIALIGN2-2 if you want to use the Bio.Align.Applications wrapper. test_DocSQL ... ok test_Emboss ... skipping. Install EMBOSS if you want to use Bio.Emboss. test_EmbossPhylipNew ... skipping. Install the Emboss package 'PhylipNew' if you want to use the Bio.Emboss.Applications wrappers for phylogenetic tools. test_EmbossPrimer ... ok test_Entrez ... Segmentation fault (core dumped) " python setup.py install --prefix=$HOME ==> the same ERROR MESSAGE as from "python setup.py build" Cheers & thanks in advance, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From Paul.Czodrowski at merck.de Tue May 3 14:47:00 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 3 May 2011 16:47:00 +0200 Subject: [Biopython] Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Peter, maybe as additonal question/issue: numpy is not located in "/usr/lib/python2.6/site-packages/numpy/core/include " but in another, rather global, python-lib-directory. As stated in my previous email, python setup.py build gives "gcc -pthread -fno-strict-aliasing -DNDEBUG -fomit-frame-pointer -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fPIC -I/usr/lib/python2.6/site-packages/numpy/core/include -I/usr/include/python2.6 -c Bio/Cluster/clustermodule.c -o build/temp.linux-i686-2.6/Bio/Cluster/clustermodule.o Bio/Cluster/clustermodule.c:2:31: fatal error: numpy/arrayobject.h: No such file or directory" and I would like to adapt the "-I/usr/lib/python2.6/site-packages/numpy/core/includ" accordingly to the directory where it is actually located. Cheers & thanks, Paul > > > > Dear folks, > > > > I'm struggling around with the biopython installation. > > As non-administrator, the manual states the following: > > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > > > > However, the setup.py (version 1.57) does not contain any entry " > > include_dirs=["Bio/Cluster", "your_dir/include/python"] > > ", but rather only "Bio" entries. > > > > (See attached file: setup.py) > > You didn't really need to attach a whole file, you could have > linked to our repository or quoted the bit of interest. > > > Or do I oversee anything? > > What OS are you using? Some flavour of Linux? > > What version of NumPy do you have, and how was it installed? > > What command did you use to attempt the install, and what > error message did you get. > > Have you tried the --prefix argument? > > e.g. > > python setup.py build > python setup.py test > python setup.py install --prefix=$HOME > > Peter This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From p.j.a.cock at googlemail.com Tue May 3 15:10:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 May 2011 16:10:30 +0100 Subject: [Biopython] Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On Tue, May 3, 2011 at 3:38 PM, wrote: > Dear Peter, > > >> > >> > Dear folks, >> > >> > I'm struggling around with the biopython installation. >> > As non-administrator, the manual states the following: >> > http://biopython.org/DIST/docs/install/Installation.html#htoc30 >> > >> > However, the setup.py (version 1.57) does not contain any entry " >> > include_dirs=["Bio/Cluster", "your_dir/include/python"] >> > ", but rather only "Bio" entries. >> > >> > (See attached file: setup.py) >> >> You didn't really need to attach a whole file, you could have >> linked to our repository or quoted the bit of interest. > > I'm sorry for this! Don't worry too much, its a fairly small file otherwise I wouldn't have let it though the moderation queue. >> > Or do I oversee anything? >> >> What OS are you using? Some flavour of Linux? > > OpenSuse 11.3 Should be fine. >> >> What version of NumPy do you have, and how was it installed? > > NumPy version 1.3.0, installed locally by the built-in python routines. > Any reason for installing such an old version? I'm just curious. Does NumPy work properly? At the very least, if you run python does "import numpy" work or give an error? What happens if you try and do this: $ python >>> import numpy >>> numpy.get_include() '/usr/local/lib/python2.6/site-packages/numpy/core/include' (That's the output on one of our Linux machines) If that doesn't work, perhaps your PYTHONPATH needs setting. How/where did you install NumPy? e.g. python setup.py --prefix=$HOME >> What command did you use to attempt the install, and what >> error message did you get. > python setup.py --build > ==> ERROR MESSAGE > " > running build > running build_py > running build_ext > building 'Bio.Cluster.cluster' extension > gcc -pthread -fno-strict-aliasing -DNDEBUG -fomit-frame-pointer > -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector > -funwind-tables -fasynchronous-unwind-tables -g -fPIC > -I/usr/lib/python2.6/site-packages/numpy/core/include > -I/usr/include/python2.6 -c Bio/Cluster/clustermodule.c -o > build/temp.linux-i686-2.6/Bio/Cluster/clustermodule.o > Bio/Cluster/clustermodule.c:2:31: fatal error: numpy/arrayobject.h: No such > file or directory > compilation terminated. > error: command 'gcc' failed with exit status 1 > " OK, it isn't finding the numpy header files. I'd guess from your next email the file is /usr/lib/python2.6/site-packages/numpy/core/include/numpy/arrayobject.h The hack suggested in the installation document is to edit our setup.py file to point to the path explicitly. There is probably a more elegant way, right now my guess is that NumPy is not on the python path (see above). --- >From the test results, > python setup.py test > running test > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > Operating system: posix linux2 > test_Ace ... ok > ... > test_Entrez ... Segmentation fault (core dumped) Oh, nasty! That should *not* happen, and is probably a separate issue to the NumPy header install issue. Peter From mmokrejs at fold.natur.cuni.cz Tue May 3 23:20:13 2011 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Wed, 04 May 2011 01:20:13 +0200 Subject: [Biopython] How to optimize ACE file alignment (from newbler) In-Reply-To: References: <4DBFF38E.7050406@fold.natur.cuni.cz> Message-ID: <4DC08DAD.9000100@fold.natur.cuni.cz> Hi Peter, no I haven't played with gap5 yet, so far only with consed and tablet. Thanks for noting biopython has no write support for ACE. Martin Peter Cock wrote: > On Tue, May 3, 2011 at 1:22 PM, Martin Mokrejs > wrote: >> Hi, >> I would like to ask you how can I optimize the ACE alignment with files >> produced by newbler. I see only the high-quality region is aligned while >> the rest is not. I typically ask newbler to place into the ace files untrimmed >> reads so the low-quality sequence is present, you can see it could have been >> included in the alignment and contribute the consensus quite well. >> I found a new feature of consed-20 being able to re-align the reads >> but that seemed to be too slow for me and had to kill re-processing of one >> contig. >> Is there a way to direct some program that I want to re-align just some >> columns since some position? That should first align to the consensus already >> defined and afterwards continue with de novo alignment as long as it is possible. >> Alternatively, how do you edit ACE alignments (I mean manually adjust gaps, >> move columns back and forth, re-order rows) and do you re-calculate the >> consensus? >> This is some sort of a follow-up to "Newbler ACE file to SAM?" >> posted to biopython-developers list at http://web.archiveorange.com/archive/v/5dAwXxUKZDTmQdM80MqQ >> ;) >> Martin > > Hi Martin, > > Biopython only has an ACE parser, with no support for writing ACE files. > So, even if you did manipulate the parsed ACE file in Biopython, you'd > have to write your own output code (or use a simpler file format). > > Regarding assembly editors, have you looked at Gap4 or Gap5? > > This might be a good question to ask on the http://seqanswers.com > forum. > > Peter > > From Paul.Czodrowski at merck.de Wed May 4 08:47:14 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 10:47:14 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Peter, > > Dear Peter, > > > > > >> > > >> > Dear folks, > >> > > >> > I'm struggling around with the biopython installation. > >> > As non-administrator, the manual states the following: > >> > http://biopython.org/DIST/docs/install/Installation.html#htoc30 > >> > > >> > However, the setup.py (version 1.57) does not contain any entry " > >> > include_dirs=["Bio/Cluster", "your_dir/include/python"] > >> > ", but rather only "Bio" entries. > >> > > >> > (See attached file: setup.py) > >> > >> You didn't really need to attach a whole file, you could have > >> linked to our repository or quoted the bit of interest. > > > > I'm sorry for this! > > Don't worry too much, its a fairly small file otherwise I wouldn't > have let it though the moderation queue. > > >> > Or do I oversee anything? > >> > >> What OS are you using? Some flavour of Linux? > > > > OpenSuse 11.3 > > Should be fine. > > >> > >> What version of NumPy do you have, and how was it installed? > > > > NumPy version 1.3.0, installed locally by the built-in python routines. > > > > Any reason for installing such an old version? I'm just curious. No logical reason... :) > > Does NumPy work properly? At the very least, if you run python > does "import numpy" work or give an error? What happens if you > try and do this: > > $ python > >>> import numpy > >>> numpy.get_include() > '/usr/local/lib/python2.6/site-packages/numpy/core/include' > > (That's the output on one of our Linux machines) We have the same output: >>> >>> >>> numpy.get_include() '/usr/lib/python2.6/site-packages/numpy/core/include' > > If that doesn't work, perhaps your PYTHONPATH needs setting. > How/where did you install NumPy? e.g. python setup.py --prefix=$HOME The /usr/lib python is installed via the yast OpenSuse. But it seems to me that this installation did not work properly, since there are only 2 files in the directory " /usr/lib/python2.6/site-packages/numpy/core/include/numpy/": - ufunc_api.txt - multiarray_api.txt However, we have another installation of NumPy which is located here: "/SW/python/lib/python2.6/site-packages/lib/python2.6/site-packages/numpy" And yes, there is a mix-up of the directories... :) > >> What command did you use to attempt the install, and what > >> error message did you get. > > python setup.py --build > > ==> ERROR MESSAGE > > " > > running build > > running build_py > > running build_ext > > building 'Bio.Cluster.cluster' extension > > gcc -pthread -fno-strict-aliasing -DNDEBUG -fomit-frame-pointer > > -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector > > -funwind-tables -fasynchronous-unwind-tables -g -fPIC > > -I/usr/lib/python2.6/site-packages/numpy/core/include > > -I/usr/include/python2.6 -c Bio/Cluster/clustermodule.c -o > > build/temp.linux-i686-2.6/Bio/Cluster/clustermodule.o > > Bio/Cluster/clustermodule.c:2:31: fatal error: numpy/arrayobject.h: No such > > file or directory > > compilation terminated. > > error: command 'gcc' failed with exit status 1 > > " > > OK, it isn't finding the numpy header files. I'd guess from your > next email the > file is /usr/lib/python2.6/site- > packages/numpy/core/include/numpy/arrayobject.h You are wrong about this. The header file is locate here: "/SW/python/lib/python2.6/site-packages/lib/python2.6/site-packages/numpy/core/include/numpy/" By appropiately setting the PYTHONPATH, it works properly. > > The hack suggested in the installation document is to edit our setup.py > file to point to the path explicitly. There is probably a more elegant way, > right now my guess is that NumPy is not on the python path (see above). > > --- > > >From the test results, > > > python setup.py test > > running test > > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > > Operating system: posix linux2 > > test_Ace ... ok > > ... > > test_Entrez ... Segmentation fault (core dumped) > > Oh, nasty! That should *not* happen, and is probably a separate > issue to the NumPy header install issue. python setup.py install --prefix=$HOME works fine now. Should the segmentation fault still be considered? Cheers & thanks, Paul > > Peter This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From p.j.a.cock at googlemail.com Wed May 4 09:06:11 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 May 2011 10:06:11 +0100 Subject: [Biopython] Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On Wed, May 4, 2011 at 9:47 AM, wrote: > Dear Peter, > >> >> Does NumPy work properly? At the very least, if you run python >> does "import numpy" work or give an error? What happens if you >> try and do this: >> >> $ python >> >>> import numpy >> >>> numpy.get_include() >> '/usr/local/lib/python2.6/site-packages/numpy/core/include' >> >> (That's the output on one of our Linux machines) > > We have the same output: >>>> >>>> >>>> numpy.get_include() > '/usr/lib/python2.6/site-packages/numpy/core/include' > > >> >> If that doesn't work, perhaps your PYTHONPATH needs setting. >> How/where did you install NumPy? e.g. python setup.py --prefix=$HOME > > The /usr/lib python is installed via the yast OpenSuse. > But it seems to me that this installation did not work properly, > since, there are only 2 files in the directory > " /usr/lib/python2.6/site-packages/numpy/core/include/numpy/": > - ufunc_api.txt > - multiarray_api.txt > > However, we have another installation of NumPy which is located here: > "/SW/python/lib/python2.6/site-packages/lib/python2.6/site-packages/numpy" > > And yes, there is a mix-up of the directories... :) I think that explains why the Biopython install didn't work originally, it found the broken NumPy under /usr/lib rather than your good one installed under /SW/ You might want to try and remove the broken NumPy, as it may cause you problems installing other python libraries. > > By appropiately setting the PYTHONPATH, it works properly. > OK, good. >> >From the test results, >> >> > python setup.py test >> > running test >> > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) >> > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] >> > Operating system: posix linux2 >> > test_Ace ... ok >> > ... >> > test_Entrez ... Segmentation fault (core dumped) >> >> Oh, nasty! That should *not* happen, and is probably a separate >> issue to the NumPy header install issue. > > python setup.py install --prefix=$HOME works fine now. > > Should the segmentation fault still be considered? Yes please. I assume it still breaks? Can you try changing to the Tests subdirectory from the Biopython source, and doing: python test_Entrez.py That should run just the Entrez tests, and hopefully give a bit more information about what/when the segmentation fault occurs. I suspect a problem in one of the Python C libraries that Biopython is using (since as far as I can recall, all the Bio.Entrez code is pure python). Peter From mictadlo at gmail.com Wed May 4 09:59:13 2011 From: mictadlo at gmail.com (Michal) Date: Wed, 04 May 2011 19:59:13 +1000 Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> References: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> Message-ID: <4DC12371.3040204@gmail.com> Hi Peter, Do you have the script which read https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml and what would be the correct output? Thank you in advance. Cheers, Michal On 05/03/2011 11:31 PM, Chris Fields wrote: > Haven't tried this using the latest BLAST+ myself, but it doesn't surprise me too much. Also agree re: some kind of bug tracking with NCBI; I believe they have an internal one, but it would be nice to have a public interface to it. > > chris > > On May 3, 2011, at 4:24 AM, Peter Cock wrote: > >> Hello all, >> >> I've CC'd the BioPerl, BioRuby, BioJava and Biopython development mailing >> lists to make sure you're aware of this, but can we continue any discussion >> on the cross-project open-bio-l mailing list please? >> >> I noticed that recent versions of BLAST are not using a single >> block for each query, which was the historical behaviour and assumed >> by the Biopython BLAST XML parser. This may be a bug in BLAST. >> See link below for an example. >> >> Has anyone else noticed this, and has it been reported to the NCBI yet? >> >> Thanks, >> >> Peter >> >> (Not for the first time, I wish there was a public bug tracker for BLAST, >> or at least a private bug tracker so we could talk about issues with an >> NCBI assigned reference number.) >> >> ---------- Forwarded message ---------- >> From: Peter Cock >> Date: Wed, Apr 20, 2011 at 6:08 PM >> Subject: Interesting BLAST 2.2.25+ XML behaviour >> To: Biopython-Dev Mailing List >> >> >> Hi all, >> >> Have a look at this XML file from a FASTA vs FASTA search >> using blastp from BLAST 2.2.25+ (current release), which >> is a test file I created for the BLAST+ wrappers in Galaxy: >> >> https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml >> >> I just put it though the Biopython BLAST XML parser, and >> was surprised not to get four records back (since as you >> might guess from the filename, there were four queries). >> >> It appears this version of BLAST+ is incrementing the >> iteration counter for each match... or something like that. >> >> Has anyone else noticed this? I wonder if it is accidental... >> >> Peter >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From p.j.a.cock at googlemail.com Wed May 4 10:36:57 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 May 2011 11:36:57 +0100 Subject: [Biopython] [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: <4DC12371.3040204@gmail.com> References: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> <4DC12371.3040204@gmail.com> Message-ID: On Wed, May 4, 2011 at 10:59 AM, Michal wrote: > Hi Peter, > Do you have the script which read > > https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml > > > and what would be the correct output? > > Thank you in advance. > > Cheers, > Michal Hi Michal, I'm not quite sure what you're asking, but I'll try. First, the three data files: $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/four_human_proteins.fasta $ wget https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/rhodopsin_proteins.fasta The query file has four sequences, $ grep -c "^>" four_human_proteins.fasta 4 $ grep "^>" four_human_proteins.fasta >sp|Q9BS26|ERP44_HUMAN Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1 >sp|Q9NSY1|BMP2K_HUMAN BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2 >sp|P06213|INSR_HUMAN Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4 >sp|P08100|OPSD_HUMAN Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1 Based on past experience, I would expect 4 iteration blocks in the XML, but in this case I have 24: $ grep "" -c blastp_four_human_vs_rhodopsin.xml 24 Notice we get 6 iterations for each query (4 times 6 is 24): $ grep "" blastp_four_human_vs_rhodopsin.xml sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9BS26|ERP44_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|Q9NSY1|BMP2K_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P06213|INSR_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN sp|P08100|OPSD_HUMAN Now, using the two FASTA files directly and re-running blastp, what do I get? $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5 | grep "" -c 24 Or again with -parse_deflines, which changes how the hit ID/def is presented: $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5 -parse_deflines | grep "" -c 24 How about older versions? $ ~/Downloads/ncbi-blast-2.2.24+/bin/blastp -query four_human_proteins.fasta -subject rhodopsin_proteins.fasta -outfmt 5 BLAST engine error: XML formatting is only supported for a database search I'll have to make a blast database first... $ ~/Downloads/ncbi-blast-2.2.24+/bin/makeblastdb -in rhodopsin_proteins.fasta -dbtype prot Building a new DB, current time: 05/04/2011 11:22:57 New DB name: rhodopsin_proteins.fasta New DB title: rhodopsin_proteins.fasta Sequence type: Protein Keep Linkouts: T Keep MBits: T Maximum file size: 1073741824B Adding sequences from FASTA; added 6 sequences in 0.105655 seconds. $ ~/Downloads/ncbi-blast-2.2.25+/bin/blastp -query four_human_proteins.fasta -db rhodopsin_proteins.fasta -outfmt 5 | grep "" -c 4 Look - just four identifiers as I expect! This also works if the database is built with the -parse_seqids switch. The same happens with older versions of BLAST+, one block per query, so four iteration blocks for this example. I tried all of 2.2.21+, 2.2.22+, 2.2.23+ and 2.2.24+ (running makeblastdb to give a fresh database, then blastp). That seems to demonstrate that bug is specific to the XML output from FASTA vs FASTA (not FASTA vs DB), which is a new feature in NCBI BLAST 2.2.25+ I will raise this with the NCBI, and report back. However, even if the NCBI fix it in the next release, we (Bio*) may want to update our parsers to cope with this quirk, or at least put a warning in our BLAST XML parser documentation, as there will be lots of installations of NCBI BLAST 2.2.25+ in the wild. Peter From Paul.Czodrowski at merck.de Wed May 4 11:30:16 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 13:30:16 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Peter, > >> >From the test results, > >> > >> > python setup.py test > >> > running test > >> > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > >> > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > >> > Operating system: posix linux2 > >> > test_Ace ... ok > >> > ... > >> > test_Entrez ... Segmentation fault (core dumped) > >> > >> Oh, nasty! That should *not* happen, and is probably a separate > >> issue to the NumPy header install issue. > > > > python setup.py install --prefix=$HOME works fine now. > > > > Should the segmentation fault still be considered? > > Yes please. I assume it still breaks? Can you try changing to the > Tests subdirectory from the Biopython source, and doing: > > python test_Entrez.py I cannot find the src directory. Here is my Bio/ directory: " Affy Align AlignIO Alphabet Application Blast CAPS Clustalw Cluster Compass cpairwise2.so Crystal Data DocSQL.py DocSQL.pyc Emboss Entrez ExPASy File.py File.pyc FSSP GA GenBank Geo Graphics HMM HotRand.py HotRand.pyc Index.py Index.pyc __init__.py __init__.pyc InterPro KDTree KEGG kNN.py kNN.pyc LogisticRegression.py LogisticRegression.pyc MarkovModel.py MarkovModel.pyc MaxEntropy.py MaxEntropy.pyc Medline Motif NaiveBayes.py NaiveBayes.pyc NeuralNetwork Nexus NMR pairwise2.py pairwise2.pyc Parsers ParserSupport.py ParserSupport.pyc Pathway PDB Phylo PopGen _py3k.py _py3k.pyc Restriction SCOP Search.py Search.pyc SeqFeature.py SeqFeature.pyc SeqIO Seq.py Seq.pyc SeqRecord.py SeqRecord.pyc Sequencing SeqUtils Statistics SubsMat SVDSuperimposer SwissProt triefind.py triefind.pyc trie.so UniGene Wise " BTW, python setup.py install --prefix=$HOME did not break. Thanks & Ceers, Pau? > > That should run just the Entrez tests, and hopefully give a bit > more information about what/when the segmentation fault > occurs. I suspect a problem in one of the Python C libraries > that Biopython is using (since as far as I can recall, all the > Bio.Entrez code is pure python). > > Peter This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From anaryin at gmail.com Wed May 4 11:41:07 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 4 May 2011 13:41:07 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On the same level of Bio/ you have another directory called Tests/. If I list my biopython directory: joaor at home: ls biopython-git/ *Bio* BioSQL CONTRIB DEPRECATED Doc LICENSE MANIFEST.in NEWS README Scripts *Tests* build do2to3.py setup.py The file Peter was talking about should be there. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Wed, May 4, 2011 at 1:30 PM, wrote: > Dear Peter, > > > > > >> >From the test results, > > >> > > >> > python setup.py test > > >> > running test > > >> > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > > >> > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > > >> > Operating system: posix linux2 > > >> > test_Ace ... ok > > >> > ... > > >> > test_Entrez ... Segmentation fault (core dumped) > > >> > > >> Oh, nasty! That should *not* happen, and is probably a separate > > >> issue to the NumPy header install issue. > > > > > > python setup.py install --prefix=$HOME works fine now. > > > > > > Should the segmentation fault still be considered? > > > > Yes please. I assume it still breaks? Can you try changing to the > > Tests subdirectory from the Biopython source, and doing: > > > > python test_Entrez.py > > I cannot find the src directory. > Here is my Bio/ directory: > " > Affy > Align > AlignIO > Alphabet > Application > Blast > CAPS > Clustalw > Cluster > Compass > cpairwise2.so > Crystal > Data > DocSQL.py > DocSQL.pyc > Emboss > Entrez > ExPASy > File.py > File.pyc > FSSP > GA > GenBank > Geo > Graphics > HMM > HotRand.py > HotRand.pyc > Index.py > Index.pyc > __init__.py > __init__.pyc > InterPro > KDTree > KEGG > kNN.py > kNN.pyc > LogisticRegression.py > LogisticRegression.pyc > MarkovModel.py > MarkovModel.pyc > MaxEntropy.py > MaxEntropy.pyc > Medline > Motif > NaiveBayes.py > NaiveBayes.pyc > NeuralNetwork > Nexus > NMR > pairwise2.py > pairwise2.pyc > Parsers > ParserSupport.py > ParserSupport.pyc > Pathway > PDB > Phylo > PopGen > _py3k.py > _py3k.pyc > Restriction > SCOP > Search.py > Search.pyc > SeqFeature.py > SeqFeature.pyc > SeqIO > Seq.py > Seq.pyc > SeqRecord.py > SeqRecord.pyc > Sequencing > SeqUtils > Statistics > SubsMat > SVDSuperimposer > SwissProt > triefind.py > triefind.pyc > trie.so > UniGene > Wise > " > > BTW, python setup.py install --prefix=$HOME did not break. > > Thanks & Ceers, > Pau? > > > > > That should run just the Entrez tests, and hopefully give a bit > > more information about what/when the segmentation fault > > occurs. I suspect a problem in one of the Python C libraries > > that Biopython is using (since as far as I can recall, all the > > Bio.Entrez code is pure python). > > > > Peter > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Paul.Czodrowski at merck.de Wed May 4 12:40:06 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 14:40:06 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dear Joao & Peter, this is what I got: " Test error handling when presented with Fasta non-XML data ... ok Test error handling when presented with GenBank non-XML data ... ok Test parsing XML returned by EFetch, Nucleotide database (first test) ... ERROR Test parsing XML returned by EFetch, Protein database ... ERROR Test parsing XML returned by EFetch, OMIM database ... ERROR Test parsing XML returned by EFetch, PubMed database (first test) ... Segmentation fault (core dumped) " Cheers, Paul > On the same level of Bio/ you have another directory called Tests/. > > If I list my biopython directory: > > joaor at home: ls biopython-git/ > *Bio* BioSQL CONTRIB DEPRECATED Doc LICENSE > MANIFEST.in NEWS README Scripts *Tests* build > do2to3.py setup.py > > The file Peter was talking about should be there. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From p.j.a.cock at googlemail.com Wed May 4 13:17:21 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 May 2011 14:17:21 +0100 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On Wed, May 4, 2011 at 1:40 PM, wrote: > > Dear Joao & Peter, > > this is what I got: > > " > Test error handling when presented with Fasta non-XML data ... ok > Test error handling when presented with GenBank non-XML data ... ok > Test parsing XML returned by EFetch, Nucleotide database (first test) ... > ERROR > Test parsing XML returned by EFetch, Protein database ... ERROR > Test parsing XML returned by EFetch, OMIM database ... ERROR > Test parsing XML returned by EFetch, PubMed database (first test) ... > Segmentation fault (core dumped) > " > > > Cheers, > Paul Hmm, something amiss with the XML parsing I think, we're using the Python standard library xml.parsers.expat here. You said you were using OpenSuse 11.3, and the start of our test suite reported the following: Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] Operating system: posix linux2 What version of expat do you have? Try: $ python Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from xml.parsers import expat >>> print expat.__version__ $Revision: 17640 $ Do you fancy trying gdb to get a stack trace for us? I've had a quick Google, and the following issue *might* be related: http://bugs.python.org/issue4877 Peter From Paul.Czodrowski at merck.de Wed May 4 13:36:42 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 15:36:42 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dr. Paul Czodrowski Merck KGaA NCE Technologies Room A22/231 Computational Chemistry Phone: +49-6151-72 3218 Frankfurter Strasse 250 Fax: +49-6151-72 91 3218 64293 Darmstadt, Germany Email: paul.czodrowski at merck.de Mandatory information can be found at http://mandatories.merck.de biopython-bounces at lists.open-bio.org wrote on 04.05.2011 15:17:21: > On Wed, May 4, 2011 at 1:40 PM, wrote: > > > > Dear Joao & Peter, > > > > this is what I got: > > > > " > > Test error handling when presented with Fasta non-XML data ... ok > > Test error handling when presented with GenBank non-XML data ... ok > > Test parsing XML returned by EFetch, Nucleotide database (first test) ... > > ERROR > > Test parsing XML returned by EFetch, Protein database ... ERROR > > Test parsing XML returned by EFetch, OMIM database ... ERROR > > Test parsing XML returned by EFetch, PubMed database (first test) ... > > Segmentation fault (core dumped) > > " > > > > > > Cheers, > > Paul > > Hmm, something amiss with the XML parsing I think, we're > using the Python standard library xml.parsers.expat here. > > You said you were using OpenSuse 11.3, and the start of our test > suite reported the following: > > Python version: 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) > [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] > Operating system: posix linux2 > > What version of expat do you have? Try: > > $ python > Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) > [GCC 4.4.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from xml.parsers import expat > >>> print expat.__version__ > $Revision: 17640 $ $Revision: 1.1 $ > > Do you fancy trying gdb to get a stack trace for us? How shall I understand your question? Shall I use the gnu debugger in order to get some debuggable output? What is the worst case scenario related to biopython, i.e. could it ultimately lead to any errors/instabilities? Cheers, Paul > > I've had a quick Google, and the following issue *might* be > related: http://bugs.python.org/issue4877 > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From p.j.a.cock at googlemail.com Wed May 4 14:13:47 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 4 May 2011 15:13:47 +0100 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: References: Message-ID: On Wed, May 4, 2011 at 2:36 PM, wrote: >> >> Do you fancy trying gdb to get a stack trace for us? > > How shall I understand your question? Shall I use the gnu debugger > in order to get some debuggable output? Yes please. With hindsight, "Could you try using the gnu debugger (gdb) to get a stack trace?" would have been clearer. Are you familiar with gdb? Was it the "Do you fancy *activity*?" phrasing that was unclear? Basically meaning "Would you like to do *activity*?". > What is the worst case scenario related to biopython, i.e. could it > ultimately lead to any errors/instabilities? It looks like if you tried to use Biopython's Bio.Entrez module to parse XML files from the NCBI it would crash. If you are not going to use that module, you should be fine. Peter From Paul.Czodrowski at merck.de Wed May 4 14:25:23 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Wed, 4 May 2011 16:25:23 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: installation as non-administrator In-Reply-To: Message-ID: Dr. Paul Czodrowski Merck KGaA NCE Technologies Room A22/231 Computational Chemistry Phone: +49-6151-72 3218 Frankfurter Strasse 250 Fax: +49-6151-72 91 3218 64293 Darmstadt, Germany Email: paul.czodrowski at merck.de Mandatory information can be found at http://mandatories.merck.de Peter Cock wrote on 04.05.2011 16:13:47: > On Wed, May 4, 2011 at 2:36 PM, wrote: > >> > >> Do you fancy trying gdb to get a stack trace for us? > > > > How shall I understand your question? Shall I use the gnu debugger > > in order to get some debuggable output? > > Yes please. > > With hindsight, "Could you try using the gnu debugger (gdb) to get > a stack trace?" would have been clearer. Are you familiar with gdb? > > Was it the "Do you fancy *activity*?" phrasing that was unclear? > Basically meaning "Would you like to do *activity*?". Yes, it was just the expression you used. I have to admit that English is not my mother tongue. > > > What is the worst case scenario related to biopython, i.e. could it > > ultimately lead to any errors/instabilities? > > It looks like if you tried to use Biopython's Bio.Entrez module to > parse XML files from the NCBI it would crash. If you are not going > to use that module, you should be fine. Good news, thanks :) And thanks for all the other help, also to JOAO!! Cheers, Paul > > Peter This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From Paul.Czodrowski at merck.de Tue May 10 07:50:23 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 10 May 2011 09:50:23 +0200 Subject: [Biopython] PDB parsing Message-ID: Dear folks, how do I add a B-factor as well as an occupancy column to a PDB file? I guess Bio.PDB is the appropriate module. But I already fail with regards to a simple PDB load... Cheers, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From anaryin at gmail.com Tue May 10 08:30:04 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 10 May 2011 10:30:04 +0200 Subject: [Biopython] PDB parsing In-Reply-To: References: Message-ID: Hey Paul, When you parse a PDB file with PDBParser it automatically retrieves both B-factor and occupancy. If it fails to do so for any reason, it defaults those values to 0. After parsing, you can set those values explicitly by modifying the corresponding attribute of the Atom object. So, for example, to change the B-factor of all your atoms to 10.0, you just have to do: for atom in structure.get_atoms(): > atom.bfactor = 10.0 > Hope this answered your question. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, May 10, 2011 at 9:50 AM, wrote: > > Dear folks, > > how do I add a B-factor as well as an occupancy column to a PDB file? > > I guess Bio.PDB is the appropriate module. > But I already fail with regards to a simple PDB load... > > > Cheers, > Paul > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Paul.Czodrowski at merck.de Tue May 10 09:19:54 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 10 May 2011 11:19:54 +0200 Subject: [Biopython] Antwort: Re: PDB parsing In-Reply-To: Message-ID: Dear Joao, this one does not work: " structure_id = "1234" PDBFILE = open(filename,'r').read() p = PDBParser(PERMISSIVE=1) p._parse(PDBFILE) pp = p.get_structure(structure_id, PDBFILE) for atom in pp.get_atoms(): atom.bfactor = 10.0 print atom.bfactor " "p.get_structure(structure_id, PDBFILE)" seems to get the structural data, but setting the bfactor does not give any output. Cheers & Thanks, Paul > Hey Paul, > > When you parse a PDB file with PDBParser it automatically retrieves both > B-factor and occupancy. If it fails to do so for any reason, it defaults > those values to 0. > > After parsing, you can set those values explicitly by modifying the > corresponding attribute of the Atom object. So, for example, to change the > B-factor of all your atoms to 10.0, you just have to do: > > for atom in structure.get_atoms(): > > atom.bfactor = 10.0 > > > > Hope this answered your question. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > Dear folks, > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > I guess Bio.PDB is the appropriate module. > > But I already fail with regards to a simple PDB load... > > > > > > Cheers, > > Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From anaryin at gmail.com Tue May 10 09:27:37 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 10 May 2011 11:27:37 +0200 Subject: [Biopython] Antwort: Re: PDB parsing In-Reply-To: References: Message-ID: Hey Paul, First of all, you should not call _parse on your own. That is called already when you call get_structure(). Generally, if a method has an underscore behind its name it means it shouldn't really be called unless you really know what you want to do with it. What version of Biopython are you using? I'd do this: structure_id = "1234" > PDBFILE = open(filename,'r') > p = PDBParser(PERMISSIVE=1) > pp = p.get_structure(structure_id, PDBFILE) > > for atom in pp.get_atoms(): > atom.bfactor = 10.0 > print atom.bfactor > It works pretty well here, with version 1.57. Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, May 10, 2011 at 11:19 AM, wrote: > Dear Joao, > > this one does not work: > " > > structure_id = "1234" > PDBFILE = open(filename,'r').read() > p = PDBParser(PERMISSIVE=1) > p._parse(PDBFILE) > pp = p.get_structure(structure_id, PDBFILE) > > > for atom in pp.get_atoms(): > atom.bfactor = 10.0 > print atom.bfactor > " > > > "p.get_structure(structure_id, PDBFILE)" seems to get the structural data, > but setting the bfactor does not give any output. > > > > > Cheers & Thanks, > Paul > > > > Hey Paul, > > > > When you parse a PDB file with PDBParser it automatically retrieves both > > B-factor and occupancy. If it fails to do so for any reason, it defaults > > those values to 0. > > > > After parsing, you can set those values explicitly by modifying the > > corresponding attribute of the Atom object. So, for example, to change > the > > B-factor of all your atoms to 10.0, you just have to do: > > > > for atom in structure.get_atoms(): > > > atom.bfactor = 10.0 > > > > > > > Hope this answered your question. > > > > Cheers, > > > > Jo?o [...] Rodrigues > > http://nmr.chem.uu.nl/~joao > > > > > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > > > > Dear folks, > > > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > > > I guess Bio.PDB is the appropriate module. > > > But I already fail with regards to a simple PDB load... > > > > > > > > > Cheers, > > > Paul > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Paul.Czodrowski at merck.de Tue May 10 09:32:33 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 10 May 2011 11:32:33 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: PDB parsing In-Reply-To: Message-ID: Dear Jo?o, cool, thank you very much so far! How do I output the newly generated PDBfile? Cheers & thanks, Paul > Hey Paul, > > First of all, you should not call _parse on your own. That is called > already when you call get_structure(). Generally, if a method has an > underscore behind its name it means it shouldn't really be called > unless you really know what you want to do with it. > > What version of Biopython are you using? > > I'd do this: > structure_id = "1234" > PDBFILE = open(filename,'r') > p = PDBParser(PERMISSIVE=1) > pp = p.get_structure(structure_id, PDBFILE) > > for atom in pp.get_atoms(): > ?atom.bfactor = 10.0 > ?print atom.bfactor > > It works pretty well here, with version 1.57. > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > On Tue, May 10, 2011 at 11:19 AM, wrote: > Dear Joao, > > this one does not work: > " > > structure_id = "1234" > PDBFILE = open(filename,'r').read() > p = PDBParser(PERMISSIVE=1) > p._parse(PDBFILE) > pp = p.get_structure(structure_id, PDBFILE) > > > for atom in pp.get_atoms(): > ?atom.bfactor = 10.0 > ?print atom.bfactor > " > > > "p.get_structure(structure_id, PDBFILE)" seems to get the structural data, > but setting the bfactor does not give any output. > > > > > Cheers & Thanks, > Paul > > > > Hey Paul, > > > > When you parse a PDB file with PDBParser it automatically retrieves both > > B-factor and occupancy. If it fails to do so for any reason, it defaults > > those values to 0. > > > > After parsing, you can set those values explicitly by modifying the > > corresponding attribute of the Atom object. So, for example, to change > the > > B-factor of all your atoms to 10.0, you just have to do: > > > > for atom in structure.get_atoms(): > > > ? atom.bfactor = 10.0 > > > > > > > Hope this answered your question. > > > > Cheers, > > > > Jo?o [...] Rodrigues > > http://nmr.chem.uu.nl/~joao > > > > > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > > > > Dear folks, > > > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > > > I guess Bio.PDB is the appropriate module. > > > But I already fail with regards to a simple PDB load... > > > > > > > > > Cheers, > > > Paul > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > > _______________________________________________ > Biopython mailing list ?- ?Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From anaryin at gmail.com Tue May 10 09:38:23 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 10 May 2011 11:38:23 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: PDB parsing In-Reply-To: References: Message-ID: Use PDBIO. from Bio.PDB import PDBIO IO = PDBIO() IO.set_structure(your_structure) IO.save(output_filename) You can also control which parts of the structure to output with Select. Check the documentation, it will make you progress much faster :) Cheers, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, May 10, 2011 at 11:32 AM, wrote: > Dear Jo?o, > > > cool, thank you very much so far! > > How do I output the newly generated PDBfile? > > Cheers & thanks, > Paul > > > > > Hey Paul, > > > > First of all, you should not call _parse on your own. That is called > > already when you call get_structure(). Generally, if a method has an > > underscore behind its name it means it shouldn't really be called > > unless you really know what you want to do with it. > > > > What version of Biopython are you using? > > > > I'd do this: > > > structure_id = "1234" > > PDBFILE = open(filename,'r') > > p = PDBParser(PERMISSIVE=1) > > pp = p.get_structure(structure_id, PDBFILE) > > > > for atom in pp.get_atoms(): > > atom.bfactor = 10.0 > > print atom.bfactor > > > > It works pretty well here, with version 1.57. > > > > Cheers, > > > > Jo?o [...] Rodrigues > > http://nmr.chem.uu.nl/~joao > > > > > > > On Tue, May 10, 2011 at 11:19 AM, wrote: > > Dear Joao, > > > > this one does not work: > > " > > > > structure_id = "1234" > > PDBFILE = open(filename,'r').read() > > p = PDBParser(PERMISSIVE=1) > > p._parse(PDBFILE) > > pp = p.get_structure(structure_id, PDBFILE) > > > > > > for atom in pp.get_atoms(): > > atom.bfactor = 10.0 > > print atom.bfactor > > " > > > > > > "p.get_structure(structure_id, PDBFILE)" seems to get the structural > data, > > but setting the bfactor does not give any output. > > > > > > > > > > Cheers & Thanks, > > Paul > > > > > > > Hey Paul, > > > > > > When you parse a PDB file with PDBParser it automatically retrieves > both > > > B-factor and occupancy. If it fails to do so for any reason, it > defaults > > > those values to 0. > > > > > > After parsing, you can set those values explicitly by modifying the > > > corresponding attribute of the Atom object. So, for example, to change > > the > > > B-factor of all your atoms to 10.0, you just have to do: > > > > > > for atom in structure.get_atoms(): > > > > atom.bfactor = 10.0 > > > > > > > > > > Hope this answered your question. > > > > > > Cheers, > > > > > > Jo?o [...] Rodrigues > > > http://nmr.chem.uu.nl/~joao > > > > > > > > > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > > > > > > > Dear folks, > > > > > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > > > > > I guess Bio.PDB is the appropriate module. > > > > But I already fail with regards to a simple PDB load... > > > > > > > > > > > > Cheers, > > > > Paul > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://disclaimer.merck.de to access the German, French, Spanish > and > > Portuguese versions of this disclaimer. > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://disclaimer.merck.de to access the German, French, Spanish and > Portuguese versions of this disclaimer. > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Paul.Czodrowski at merck.de Tue May 10 11:05:50 2011 From: Paul.Czodrowski at merck.de (Paul.Czodrowski at merck.de) Date: Tue, 10 May 2011 13:05:50 +0200 Subject: [Biopython] Antwort: Re: Antwort: Re: Antwort: Re: PDB parsing In-Reply-To: Message-ID: Dear Joao, thanks for your help and the documentation link! So far, I was aware of this documentation http://biopython.org/DIST/docs/tutorial/Tutorial.html wherein PDB parsing is only briefly covered. And, yes, progress is faster now! Cheers, Paul > Use PDBIO. > > from Bio.PDB import PDBIO > IO = PDBIO() > IO.set_structure(your_structure) > IO.save(output_filename) > > You can also control which parts of the structure to output with Select. > > Check the documentation org/DIST/docs/cookbook/biopdb_faq.pdf>, > it will make you progress much faster :) > > Cheers, > > Jo?o [...] Rodrigues > http://nmr.chem.uu.nl/~joao > > > > On Tue, May 10, 2011 at 11:32 AM, wrote: > > > Dear Jo?o, > > > > > > cool, thank you very much so far! > > > > How do I output the newly generated PDBfile? > > > > Cheers & thanks, > > Paul > > > > > > > > > Hey Paul, > > > > > > First of all, you should not call _parse on your own. That is called > > > already when you call get_structure(). Generally, if a method has an > > > underscore behind its name it means it shouldn't really be called > > > unless you really know what you want to do with it. > > > > > > What version of Biopython are you using? > > > > > > I'd do this: > > > > > structure_id = "1234" > > > PDBFILE = open(filename,'r') > > > p = PDBParser(PERMISSIVE=1) > > > pp = p.get_structure(structure_id, PDBFILE) > > > > > > for atom in pp.get_atoms(): > > > atom.bfactor = 10.0 > > > print atom.bfactor > > > > > > It works pretty well here, with version 1.57. > > > > > > Cheers, > > > > > > Jo?o [...] Rodrigues > > > http://nmr.chem.uu.nl/~joao > > > > > > > > > > > On Tue, May 10, 2011 at 11:19 AM, wrote: > > > Dear Joao, > > > > > > this one does not work: > > > " > > > > > > structure_id = "1234" > > > PDBFILE = open(filename,'r').read() > > > p = PDBParser(PERMISSIVE=1) > > > p._parse(PDBFILE) > > > pp = p.get_structure(structure_id, PDBFILE) > > > > > > > > > for atom in pp.get_atoms(): > > > atom.bfactor = 10.0 > > > print atom.bfactor > > > " > > > > > > > > > "p.get_structure(structure_id, PDBFILE)" seems to get the structural > > data, > > > but setting the bfactor does not give any output. > > > > > > > > > > > > > > > Cheers & Thanks, > > > Paul > > > > > > > > > > Hey Paul, > > > > > > > > When you parse a PDB file with PDBParser it automatically retrieves > > both > > > > B-factor and occupancy. If it fails to do so for any reason, it > > defaults > > > > those values to 0. > > > > > > > > After parsing, you can set those values explicitly by modifying the > > > > corresponding attribute of the Atom object. So, for example, to change > > > the > > > > B-factor of all your atoms to 10.0, you just have to do: > > > > > > > > for atom in structure.get_atoms(): > > > > > atom.bfactor = 10.0 > > > > > > > > > > > > > Hope this answered your question. > > > > > > > > Cheers, > > > > > > > > Jo?o [...] Rodrigues > > > > http://nmr.chem.uu.nl/~joao > > > > > > > > > > > > > > > > On Tue, May 10, 2011 at 9:50 AM, wrote: > > > > > > > > > > > > > > Dear folks, > > > > > > > > > > how do I add a B-factor as well as an occupancy column to a PDB file? > > > > > > > > > > I guess Bio.PDB is the appropriate module. > > > > > But I already fail with regards to a simple PDB load... > > > > > > > > > > > > > > > Cheers, > > > > > Paul > > > > > > This message and any attachment are confidential and may be privileged or > > > otherwise protected from disclosure. If you are not the intended > > recipient, > > > you must not copy this message or attachment or disclose the contents to > > > any other person. If you have received this transmission in error, please > > > notify the sender immediately and delete the message and any attachment > > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not accept liability for any omissions or errors in this > > > message which may arise as a result of E-Mail-transmission or for damages > > > resulting from any unauthorized changes of the content of this message > > and > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > > subsidiaries do not guarantee that this message is free of viruses and > > does > > > not accept liability for any damages caused by any virus transmitted > > > therewith. > > > > > > Click http://disclaimer.merck.de to access the German, French, Spanish > > and > > > Portuguese versions of this disclaimer. > > > > > > > > > _______________________________________________ > > > Biopython mailing list - Biopython at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://disclaimer.merck.de to access the German, French, Spanish and > > Portuguese versions of this disclaimer. > > > > > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. From sainitin7 at gmail.com Thu May 12 08:39:28 2011 From: sainitin7 at gmail.com (sai nitin) Date: Thu, 12 May 2011 10:39:28 +0200 Subject: [Biopython] Problem in accessing pcassay database Message-ID: Hi all, I am new to Biopython i want to access pcassay database programatically the exact issue is described below --- I have list of Bioassay AIDs i want retrieve all Names i treid esummary to do this but it is giving error also tried to efetch but didnt succeed.. Can any body tell me possible solution... Thanks -- Sainitin D From p.j.a.cock at googlemail.com Thu May 12 09:15:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 May 2011 10:15:37 +0100 Subject: [Biopython] Problem in accessing pcassay database In-Reply-To: References: Message-ID: On Thu, May 12, 2011 at 9:39 AM, sai nitin wrote: > Hi all, > > I am new to Biopython i want to access pcassay database programatically the > exact issue is described below > > --- I have list of Bioassay AIDs i want retrieve all Names i treid esummary > to do this but it is giving error > also tried to efetch but didnt succeed.. > > Can any body tell me possible solution... > > Thanks Hi, Can you do this by hand? Which website would you use? If NCBI Entrez, then it should be possible using Biopython's Bio.Entrez module. Could you give an example, say two Bioassay AIDs, and the expected results (e.g. URLs to NCBI webpage). Peter From p.j.a.cock at googlemail.com Thu May 12 19:04:45 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 May 2011 20:04:45 +0100 Subject: [Biopython] Problem in accessing pcassay database In-Reply-To: References: Message-ID: Please CC the mailing list on any reply. On Thu, May 12, 2011 at 6:59 PM, sai nitin wrote: > Hi Peter, > Thanks for reply ya tried with Bio.entrez module (biopython) Ok let me > explain issue more clearly...Say i have AID as follows > 1. AID:?504582? i want to?retrieve Description section details from this URL > (http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=504582&loc=ea_ras) > Like this i have 20 -30 AIDs I want to do this for all of them > Any suggestions it would be gr8 help > Thanks, > Sainitin If you look on the page you linked to, notice AID 504582 is itself a link to Entrez, http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=search&db=pcassay&term=504582 So, I would expect an Entrez search for 504582 in the pcassay database to work. Trying this by hand on the NCBI Entrez website work fine, then from Biopython you could do the same search with Entrez.esearch(db="pcassay", term="504582") Peter From mictadlo at gmail.com Sun May 15 05:35:07 2011 From: mictadlo at gmail.com (Michal) Date: Sun, 15 May 2011 15:35:07 +1000 Subject: [Biopython] multiprocessing problem with pysam In-Reply-To: <20110412013119.GF2053@kunkel> References: <4DA1137E.1090803@gmail.com> <20110410111510.GA2634@kunkel> <4DA2EC9D.7040004@gmail.com> <20110412013119.GF2053@kunkel> Message-ID: <4DCF660B.30309@gmail.com> Hello, Thank you Brad. I have written the following new code: import re import os import pysam from pprint import pprint from multiprocessing import Pool class Test(): def __init__(self, bam_filename, cultivars): self.__bam_fh = pysam.Samfile(bam_filename, "rb") self.__cultivars = cultivars def run(self, ref_name): print os.getpid(), ref_name, self.__cultivars return (os.getpid(), ref_name) if __name__ == '__main__': cultivars = 'Ja,Ea,As'.replace(' ', '').split(',') bam_filename = "/media/usb/tests/test.bam" bamfile = pysam.Samfile(bam_filename, "rb") ref_names = bamfile.references ref_lengths = bamfile.lengths bamfile.close() # for ref_name in ref_names: # Test(bam_filename, cultivars).run(ref_names) pool = Pool() results = dict(pool.imap_unordered( Test(bam_filename, cultivars).run, ref_names)) pool.close() pool.join() pprint(results) and got the follwing error: Exception in thread Thread-2: Traceback (most recent call last): File "/home/mictadlo/apps/python/lib/python2.7/threading.py", line 530, in __bootstrap_inner self.run() File "/home/mictadlo/apps/python/lib/python2.7/threading.py", line 483, in run self.__target(*self.__args, **self.__kwargs) File "/home/mictadlo/apps/python/lib/python2.7/multiprocessing/pool.py", line 285, in _handle_tasks put(task) PicklingError: Can't pickle : attribute lookup __builtin__.instancemethod failed I have search and found two possible solution for this problem: * http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html * http://www.rueckstiess.net/research/snippets/show/ca1d7d90 However, is there a better way to solve it or the above solution are not good? Thank you in advance. Michal From chapmanb at 50mail.com Sun May 15 15:53:46 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 15 May 2011 11:53:46 -0400 Subject: [Biopython] multiprocessing problem with pysam In-Reply-To: <4DCF660B.30309@gmail.com> References: <4DA1137E.1090803@gmail.com> <20110410111510.GA2634@kunkel> <4DA2EC9D.7040004@gmail.com> <20110412013119.GF2053@kunkel> <4DCF660B.30309@gmail.com> Message-ID: <20110515155346.GD2530@kunkel> Michal; [multiprocessing] > class Test(): > def __init__(self, bam_filename, cultivars): > self.__bam_fh = pysam.Samfile(bam_filename, "rb") > self.__cultivars = cultivars > > def run(self, ref_name): > print os.getpid(), ref_name, self.__cultivars > return (os.getpid(), ref_name) [...] > pool = Pool() > results = dict(pool.imap_unordered( > Test(bam_filename, cultivars).run, ref_names)) [...] > and got the follwing error: > > Exception in thread Thread-2: [...] > PicklingError: Can't pickle : attribute > lookup __builtin__.instancemethod failed multiprocessing is sensitive to passing or calling complex class objects. My suggestion is to use functions without associated state attributes and pass in your information as standard python objects (strings, lists, dicts). I use a little decorator to make writing the functions passed easier: import functools def map_wrap(f): @functools.wraps(f) def wrapper(*args, **kwargs): return apply(f, *args, **kwargs) return wrapper Then would write your function as: @map_wrap def run_test(bam_filename, cultivars, ref_name): bam_fh = pysam.Samfile(bam_filename, "rb") print os.getpid(), ref_name, cultivars return (os.getpid(), ref_name) and call it with: cultivars = 'Ja,Ea,As'.replace(' ', '').split(',') bam_filename = "/media/usb/tests/test.bam" bamfile = pysam.Samfile(bam_filename, "rb") ref_names = bamfile.references bamfile.close() pool = Pool() results = dict(pool.imap(run_test, ((bam_filename, cultivars, ref) for ref in ref_names))) pool.close() Hope this helps, Brad From aradwen at gmail.com Wed May 18 15:28:25 2011 From: aradwen at gmail.com (Radhouane Aniba) Date: Wed, 18 May 2011 11:28:25 -0400 Subject: [Biopython] Snippets Sharing Message-ID: Hi guys, I apologize if that mail sounds like an ad, please consider it just like an annoucement. I just wanted you to be aware of the change that occured to biocoders.net We restructured it to be an online collaboration tool for bioinformatics, you could create groups for your projects, interact with other users, upload snippets and software packages that you find useful, discuss latest topics in bioinformatics, find newest jobs (we partner with simplyhired jobboard) and much more. I am not writing an extended mail so that you don't feel like spammed, it is not my goal. Just come an explore biocoders.net new formula. Cheers, Radhouane From p.j.a.cock at googlemail.com Wed May 18 20:42:02 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 18 May 2011 21:42:02 +0100 Subject: [Biopython] gff3 problem In-Reply-To: <20110408121041.GM20963@sobchak> References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> Message-ID: On Fri, Apr 8, 2011 at 1:10 PM, Brad Chapman wrote: > Leighton and Peter; > >> > Just to further complicate matters, the symbol convention for GFF3 differs >> > from Biopython in terms of the categories it defines: >> > + is positive strand >> > - is negative strand >> > . is not stranded (i.e. strand not relevant) >> > ? is strand relevant, but not known >> > http://www.sequenceontology.org/gff3.shtml > > Yes, although this strikes me a bit like fuzzy features in terms of > usefulness. > >> > The latter two are distinct, but not distinguished by convention in >> > Biopython: >> > The obvious (to me) mapping of the four allowed Biopython symbols to the >> > GFF3 convention is: >> > +1 -> + >> > -1 -> - >> > None -> . >> > 0 -> ? >> > because 'None' is semantically close to 'has no strand information of >> > consequence', and 0 is the mean of +1 and -1 ;) > > That's fine by me. Right now both '?' and '.' are converted to None > so I lose the subtle distinction GFF is introducing: > > strand_map = {'+' : 1, '-' : -1, '?' : None, None: None} > > If everyone agrees on that coding it's no problem to swap it over. > Brad So was the consensus that we should reword the Bio.SeqFeature docstring so say the four valid values for strand are (with GFF3 equivalents in brackets): +1 = Forward (+ in GFF3) -1 = Reverse (- in GFF3) 0 = Not stranded (. in GFF3) None = Unknown (? in GFF3) And should features on a protein sequence should then have strand 0? Peter From hxcan at stupidbeauty.com Thu May 19 05:00:37 2011 From: hxcan at stupidbeauty.com (=?GB2312?B?ssy78Mqk?=) Date: Thu, 19 May 2011 13:00:37 +0800 Subject: [Biopython] missing dtd file Message-ID: <4DD4A3F5.8020406@stupidbeauty.com> An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Thu May 19 07:57:17 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 19 May 2011 08:57:17 +0100 Subject: [Biopython] missing dtd file In-Reply-To: <4DD4A3F5.8020406@stupidbeauty.com> References: <4DD4A3F5.8020406@stupidbeauty.com> Message-ID: 2011/5/19 ??? : > Hello > > > Entrez module gives this warning: > > /usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py:495: UserWarning: > Unable to load DTD file eLink_101123.dtd. > > Bio.Entrez uses NCBI's DTD files to parse XML files ... > > For this purpose, please download eLink_101123.dtd from > > http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_101123.dtd > > ... Thank you for alerting us, that file will be included in our next release. Could you update your copy of Biopython successfully? Peter From esa.aalto at oulu.fi Thu May 19 13:02:17 2011 From: esa.aalto at oulu.fi (Esa Aalto) Date: Thu, 19 May 2011 16:02:17 +0300 Subject: [Biopython] An error with Concatenate nexus Message-ID: <3C36433088B0FF4B834B351A67C98111E6F721@KEKO.univ.yo.oulu.fi> Dear group, I'm trying to concatenate 20 nexus files with the instructions given here: http://www.biopython.org/wiki/Concatenate_nexus but it doesn't work: Traceback (most recent call last): File "C:\Python27\concate_nexus.py", line 36, in nexi = [(handle.name, Nexus.Nexus(handle)) for handle in handles] File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 555, in __init__ self.read(input) File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 618, in read self._parse_nexus_block(title, contents) File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 659, in _parse_nexus_block getattr(self,'_'+line.command)(line.options) File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 1021, in _codonposset raise NexusError('Formatting Error in codonposset: %s ' % options) NexusError: Formatting Error in codonposset: * UNTITLED = 1: 1-577\3, 2: 2-578\3, 3: 3-579\3 The end of the first of my nex files looks like this: BEGIN SETS; TaxSet A_thaliana = 1; TaxSet A_lyrata = 2; TaxSet Boh = 3-32; TaxSet Ice = 33-60; TaxSet Ith = 61-92; TaxSet Kar = 93-124; TaxSet Lom = 125-156; TaxSet NC = 157-196; TaxSet Pl = 197-236; TaxSet Sp = 237-274; TaxSet Stu = 275-294; TaxSet South = 3-32 197-236; TaxSet North = 125-156 237-274; TaxSet lyrata = 2-294; END; BEGIN CODONS; CODONPOSSET * UNTITLED = 1: 1-577\3, 2: 2-578\3, 3: 3-579\3; CODESET * UNTITLED = Universal: all; END; BEGIN CODONUSAGE; END; BEGIN DnaSP; Genome= Diploid; ChromosomalLocation= Autosome; VariationType= DNA_Seq_Pol; Species= ---; ChromosomeName= ---; GenomicPosition= 1; GenomicAssembly= ---; DnaSPversion= Ver. 5.10.00; END; Could someone tell what's wrong here? Is it my nexus files or something in the code? Thanks for your help! Esa Aalto From cy at cymon.org Thu May 19 14:30:36 2011 From: cy at cymon.org (Cymon Cox) Date: Thu, 19 May 2011 15:30:36 +0100 Subject: [Biopython] An error with Concatenate nexus In-Reply-To: <3C36433088B0FF4B834B351A67C98111E6F721@KEKO.univ.yo.oulu.fi> References: <3C36433088B0FF4B834B351A67C98111E6F721@KEKO.univ.yo.oulu.fi> Message-ID: Hi Esa, At first glance this looks like a bug. But given that Nexus.combine() is going to discard your codonposset character partition anyway, you could try deleting it from the Nexus file before combining. Regards, Cymon On 19 May 2011 14:02, Esa Aalto wrote: > Dear group, > > I'm trying to concatenate 20 nexus files with the instructions given > here: > > http://www.biopython.org/wiki/Concatenate_nexus > > but it doesn't work: > > Traceback (most recent call last): > File "C:\Python27\concate_nexus.py", line 36, in > nexi = [(handle.name, Nexus.Nexus(handle)) for handle in handles] > File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 555, in > __init__ > self.read(input) > File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 618, in > read > self._parse_nexus_block(title, contents) > File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 659, in > _parse_nexus_block > getattr(self,'_'+line.command)(line.options) > File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 1021, in > _codonposset > raise NexusError('Formatting Error in codonposset: %s ' % options) > NexusError: Formatting Error in codonposset: * UNTITLED = 1: 1-577\3, 2: > 2-578\3, 3: 3-579\3 > > The end of the first of my nex files looks like this: > > BEGIN SETS; > TaxSet A_thaliana = 1; > TaxSet A_lyrata = 2; > TaxSet Boh = 3-32; > TaxSet Ice = 33-60; > TaxSet Ith = 61-92; > TaxSet Kar = 93-124; > TaxSet Lom = 125-156; > TaxSet NC = 157-196; > TaxSet Pl = 197-236; > TaxSet Sp = 237-274; > TaxSet Stu = 275-294; > TaxSet South = 3-32 197-236; > TaxSet North = 125-156 237-274; > TaxSet lyrata = 2-294; > END; > > BEGIN CODONS; > CODONPOSSET * UNTITLED = > 1: 1-577\3, > 2: 2-578\3, > 3: 3-579\3; > CODESET * UNTITLED = Universal: all; > END; > > BEGIN CODONUSAGE; > END; > > BEGIN DnaSP; > Genome= Diploid; > ChromosomalLocation= Autosome; > VariationType= DNA_Seq_Pol; > Species= ---; > ChromosomeName= ---; > GenomicPosition= 1; > GenomicAssembly= ---; > DnaSPversion= Ver. 5.10.00; > END; > > Could someone tell what's wrong here? Is it my nexus files or something > in the code? > > Thanks for your help! > > Esa Aalto > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- From fkelesh at gmail.com Fri May 20 09:33:03 2011 From: fkelesh at gmail.com (Fatih Keles) Date: Fri, 20 May 2011 12:33:03 +0300 Subject: [Biopython] installing biopython on mac os x 10.6 Message-ID: Hi, I was trying to install Biopython on mac os x 10.6 using X11. However, It gives this error : """ running install running build running build_py running build_ext building 'Bio.cpairwise2' extension gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -arch ppc -arch i386 -g -O2 -DNDEBUG -g -O3 -IBio -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.3-fat-2.7/Bio/cpairwise2module.o unable to execute gcc-4.0: No such file or directory error: command 'gcc-4.0' failed with exit status 1 """ I couldn't find the problem. I would be happy if you help me. Thanks, keles From p.j.a.cock at googlemail.com Fri May 20 09:40:16 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 20 May 2011 10:40:16 +0100 Subject: [Biopython] installing biopython on mac os x 10.6 In-Reply-To: References: Message-ID: On Fri, May 20, 2011 at 10:33 AM, Fatih Keles wrote: > Hi, > > I was trying to install Biopython on mac os x 10.6 using X11. However, > It gives this error : > """ > > running install > running build > running build_py > running build_ext > building 'Bio.cpairwise2' extension > gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -arch ppc -arch i386 > -g -O2 -DNDEBUG -g -O3 -IBio > -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.3-fat-2.7/Bio/cpairwise2module.o > unable to execute gcc-4.0: No such file or directory > error: command 'gcc-4.0' failed with exit status 1 > """ > > I couldn't find the problem. I would be happy if you help me. > > Thanks, > > keles Have you installed Apple X Code, the development suite that comes with Apple's version of gcc (C compiler)? What we say on the download page of the wiki is: >> For Mac OS X, we recommend installing from source (see below). >> You will need to have installed Apple's XCode tools including the >> optional 10.4 SDK (check the option for 10.4 support when >> installing Xcode tools). Peter From chapmanb at 50mail.com Fri May 20 11:15:35 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 20 May 2011 07:15:35 -0400 Subject: [Biopython] gff3 problem In-Reply-To: References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> Message-ID: <20110520111535.GC21651@sobchak> Peter; [SeqFeature support for not-stranded elements] > So was the consensus that we should reword the Bio.SeqFeature > docstring so say the four valid values for strand are (with GFF3 > equivalents in brackets): > > +1 = Forward (+ in GFF3) > -1 = Reverse (- in GFF3) > 0 = Not stranded (. in GFF3) > None = Unknown (? in GFF3) > > And should features on a protein sequence should then have strand 0? That sounds great. I can make the corresponding change to the GFF library. Let me know if there are any other roadblocks to integrating that. Thanks much, Brad From p.j.a.cock at googlemail.com Fri May 20 11:27:04 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 20 May 2011 12:27:04 +0100 Subject: [Biopython] gff3 problem In-Reply-To: <20110520111535.GC21651@sobchak> References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> <20110520111535.GC21651@sobchak> Message-ID: On Fri, May 20, 2011 at 12:15 PM, Brad Chapman wrote: > Peter; > > [SeqFeature support for not-stranded elements] >> So was the consensus that we should reword the Bio.SeqFeature >> docstring so say the four valid values for strand are (with GFF3 >> equivalents in brackets): >> >> +1 = Forward (+ in GFF3) >> -1 = Reverse (- in GFF3) >> 0 = Not stranded (. in GFF3) >> None = Unknown (? in GFF3) >> >> And should features on a protein sequence then have strand 0? > > That sounds great. I can make the corresponding change to the GFF > library. Let me know if there are any other roadblocks to > integrating that. Thanks much, > Brad I've remembered a corner case, mixed strand features. e.g the Arabidopsis thaliana chloroplast complete genome, AP000423 in EMBL, NC_000932 in GenBank (one of our unit test files). e.g. gene with join(complement(69611..69724),139856..140650) Clearly the child features have well defined strands (+1 and -1). The parent feature (the join) is mixed strand. Currently our GenBank parser uses None for this. So maybe: +1 = Forward (+ in GFF3) -1 = Reverse (- in GFF3) 0 = Not stranded (. in GFF3) None = Mixed or unknown (? in GFF3) Peter From cjfields at illinois.edu Fri May 20 13:24:30 2011 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 20 May 2011 08:24:30 -0500 Subject: [Biopython] gff3 problem In-Reply-To: References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> <20110520111535.GC21651@sobchak> Message-ID: On May 20, 2011, at 6:27 AM, Peter Cock wrote: > On Fri, May 20, 2011 at 12:15 PM, Brad Chapman wrote: >> Peter; >> >> [SeqFeature support for not-stranded elements] >>> So was the consensus that we should reword the Bio.SeqFeature >>> docstring so say the four valid values for strand are (with GFF3 >>> equivalents in brackets): >>> >>> +1 = Forward (+ in GFF3) >>> -1 = Reverse (- in GFF3) >>> 0 = Not stranded (. in GFF3) >>> None = Unknown (? in GFF3) >>> >>> And should features on a protein sequence then have strand 0? >> >> That sounds great. I can make the corresponding change to the GFF >> library. Let me know if there are any other roadblocks to >> integrating that. Thanks much, >> Brad > > I've remembered a corner case, mixed strand features. e.g the > Arabidopsis thaliana chloroplast complete genome, AP000423 > in EMBL, NC_000932 in GenBank (one of our unit test files). > e.g. gene with join(complement(69611..69724),139856..140650) > > Clearly the child features have well defined strands (+1 and -1). > The parent feature (the join) is mixed strand. Currently our > GenBank parser uses None for this. So maybe: > > +1 = Forward (+ in GFF3) > -1 = Reverse (- in GFF3) > 0 = Not stranded (. in GFF3) > None = Mixed or unknown (? in GFF3) > > Peter That's essentially what bioperl does for 'split' locations (actually, I think it is just undef, which would translate to '?' for GFF3). chris From laserson at mit.edu Fri May 20 21:14:32 2011 From: laserson at mit.edu (Uri Laserson) Date: Fri, 20 May 2011 17:14:32 -0400 Subject: [Biopython] Serialize SeqRecord to JSON? Message-ID: Does anyone know of a solution for this? Thanks! Uri ................................................................................... Uri Laserson Graduate Student, Biomedical Engineering Harvard-MIT Division of Health Sciences and Technology M +1 917 742 8019 laserson at mit.edu From mjldehoon at yahoo.com Sat May 21 03:59:24 2011 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 20 May 2011 20:59:24 -0700 (PDT) Subject: [Biopython] installing biopython on mac os x 10.6 In-Reply-To: Message-ID: <782468.28393.qm@web161211.mail.bf1.yahoo.com> Probably you don't have a C compiler installed on your computer. The easiest way to get one is to install Apple's Xcode package. --Michiel. --- On Fri, 5/20/11, Fatih Keles wrote: > From: Fatih Keles > Subject: [Biopython] installing biopython on mac os x 10.6 > To: biopython at lists.open-bio.org > Date: Friday, May 20, 2011, 5:33 AM > Hi, > > I was trying to install Biopython on mac os x 10.6 using > X11. However, > It gives this error : > """ > > running install > running build > running build_py > running build_ext > building 'Bio.cpairwise2' extension > gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -arch ppc > -arch i386 > -g -O2 -DNDEBUG -g -O3 -IBio > -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.3-fat-2.7/Bio/cpairwise2module.o > unable to execute gcc-4.0: No such file or directory > error: command 'gcc-4.0' failed with exit status 1 > """ > > I couldn't find the problem. I would be happy if you help > me. > > Thanks, > > keles > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From sainitin7 at gmail.com Mon May 23 08:32:07 2011 From: sainitin7 at gmail.com (sai nitin) Date: Mon, 23 May 2011 10:32:07 +0200 Subject: [Biopython] Problem to retreive compound names using CID from PubChem Message-ID: Hi all, Myself sainitin i have list of CIDs from Pubchem Database i want retereive corresponding compundnames to automate this process im using Biopython Entrez module (Entrez.esummary) when i give one CID and try to retreive name of the compound error is occuring Code h = Entrez.esummary(db = "pccompound",id = "449489") r = Entrez.read(h) r[0]["SourceName"] Error Traceback (most recent call last): File "", line 1, in KeyError: 'SourceName' Can anybody help me to solve this Thanks -- Sainitin D From fkauff at biologie.uni-kl.de Mon May 23 10:19:30 2011 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Mon, 23 May 2011 12:19:30 +0200 Subject: [Biopython] An error with Concatenate nexus In-Reply-To: References: <3C36433088B0FF4B834B351A67C98111E6F721@KEKO.univ.yo.oulu.fi> Message-ID: <4DDA34B2.9010907@biologie.uni-kl.de> Hi Esa, are you using an up-to-date Nexus parser? The codonposset below can be read without problems when I copy-paste it into one of my nexus files. Or, if you like, send me a copy of your complete nexus file for a check. Cheers, Frank On 05/19/2011 04:30 PM, Cymon Cox wrote: > Hi Esa, > > At first glance this looks like a bug. > > But given that Nexus.combine() is going to discard your codonposset > character partition anyway, you could try deleting it from the Nexus file > before combining. > > Regards, Cymon > > On 19 May 2011 14:02, Esa Aalto wrote: > >> Dear group, >> >> I'm trying to concatenate 20 nexus files with the instructions given >> here: >> >> http://www.biopython.org/wiki/Concatenate_nexus >> >> but it doesn't work: >> >> Traceback (most recent call last): >> File "C:\Python27\concate_nexus.py", line 36, in >> nexi = [(handle.name, Nexus.Nexus(handle)) for handle in handles] >> File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 555, in >> __init__ >> self.read(input) >> File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 618, in >> read >> self._parse_nexus_block(title, contents) >> File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 659, in >> _parse_nexus_block >> getattr(self,'_'+line.command)(line.options) >> File "C:\Python27\lib\site-packages\Bio\Nexus\Nexus.py", line 1021, in >> _codonposset >> raise NexusError('Formatting Error in codonposset: %s ' % options) >> NexusError: Formatting Error in codonposset: * UNTITLED = 1: 1-577\3, 2: >> 2-578\3, 3: 3-579\3 >> >> The end of the first of my nex files looks like this: >> >> BEGIN SETS; >> TaxSet A_thaliana = 1; >> TaxSet A_lyrata = 2; >> TaxSet Boh = 3-32; >> TaxSet Ice = 33-60; >> TaxSet Ith = 61-92; >> TaxSet Kar = 93-124; >> TaxSet Lom = 125-156; >> TaxSet NC = 157-196; >> TaxSet Pl = 197-236; >> TaxSet Sp = 237-274; >> TaxSet Stu = 275-294; >> TaxSet South = 3-32 197-236; >> TaxSet North = 125-156 237-274; >> TaxSet lyrata = 2-294; >> END; >> >> BEGIN CODONS; >> CODONPOSSET * UNTITLED = >> 1: 1-577\3, >> 2: 2-578\3, >> 3: 3-579\3; >> CODESET * UNTITLED = Universal: all; >> END; >> >> BEGIN CODONUSAGE; >> END; >> >> BEGIN DnaSP; >> Genome= Diploid; >> ChromosomalLocation= Autosome; >> VariationType= DNA_Seq_Pol; >> Species= ---; >> ChromosomeName= ---; >> GenomicPosition= 1; >> GenomicAssembly= ---; >> DnaSPversion= Ver. 5.10.00; >> END; >> >> Could someone tell what's wrong here? Is it my nexus files or something >> in the code? >> >> Thanks for your help! >> >> Esa Aalto >> >> _______________________________________________ >> Biopython mailing list - Biopython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > -- > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From chapmanb at 50mail.com Mon May 23 10:42:56 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 23 May 2011 06:42:56 -0400 Subject: [Biopython] Problem to retreive compound names using CID from PubChem In-Reply-To: References: Message-ID: <20110523104256.GA2365@kunkel> Sainitin; > Code > h = Entrez.esummary(db = "pccompound",id = "449489") > r = Entrez.read(h) > r[0]["SourceName"] > > Error > Traceback (most recent call last): > File "", line 1, in > KeyError: 'SourceName' > > Can anybody help me to solve this The 'r' object you've parsed from Entrez contains a list of dictionaries. The information that is in each dictionary will be dependent on the database you are retrieving from. In this case there is no SourceName information, so python returns a KeyError to indicate this. You can examine the items in the dictionary with: for key, val in r[0].iteritems(): print key, val [...] InChI InChI=1S/C9H12IN2O8P/c10-4-2-12(9(15)11-8(4)14)7-1-5(13)6(20-7)3-19-21(16,17)18/h2,5-7,13H,1,3H2,(H,11,14,15)(H2,16,17,18)/t5-,6+,7+/m0/s1 TautomerCount 3 SourceIDList [] BondChiralCount 0 MeSHTermList ["5-iodo-2'-deoxyuridine 5'-monophosphate", '5-iodo-dUMP', 'IdUMP', 'iododeoxyuridylate', 'iododeoxyuridylate, 125I-labeled'] [...] There are also a number of good online resources for learning Python which will help give experience in debugging these kind of errors: http://learnpythonthehardway.org/index http://diveintopython.org/ Hope this helps, Brad From p.j.a.cock at googlemail.com Mon May 23 11:01:51 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 May 2011 12:01:51 +0100 Subject: [Biopython] Serialize SeqRecord to JSON? In-Reply-To: References: Message-ID: On Fri, May 20, 2011 at 10:14 PM, Uri Laserson wrote: > Does anyone know of a solution for this? > > Thanks! > Uri I thought JSON was more suited to holding simple data structures, rather than serialising arbitrary complex objects. Which bits of data do you need? The basics like the id/name/description and sequence could be presented like a tuple and encoded in JSON. Annotations begins to get complicated - but a dictionary of basic types should be fine. I suspect the biggest hurdle would be trying to encode any features. Peter From sdavis2 at mail.nih.gov Mon May 23 18:08:47 2011 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 23 May 2011 14:08:47 -0400 Subject: [Biopython] [OT] Bioconductor-2011 conference. Message-ID: All, Sorry for the slightly off-topic post, but I know there are some overlaps between Bioconductor and Biopython user groups. The Bioconductor-2011 conference will be held July 28-29, 2011 (optional: July 27 - Developer Day) at the Fred Hutchinson Cancer Research Center in Seattle, WA. This conference highlights current developments within and beyond?Bioconductor, an international open source and open development software project for the analysis and comprehension of high-throughput genomic data. ?The conference provides a forum in which to discuss the use and design of software for analyzing data arising in biology with a focus on Bioconductor and genomic data. If interested, see the website: https://secure.bioconductor.org/BioC2011/ Thanks, Sean From laserson at mit.edu Mon May 23 19:42:35 2011 From: laserson at mit.edu (Uri Laserson) Date: Mon, 23 May 2011 15:42:35 -0400 Subject: [Biopython] reading Alphabet from file Message-ID: Hi all, I am trying to implement a method that will convert a SeqRecord to a JSON serializable object. One piece of data that must be stored for a Seq object is the alphabet type. When I read this from file, what is the best practice to reload a the same alphabet type? Thanks! Uri ................................................................................... Uri Laserson Graduate Student, Biomedical Engineering Harvard-MIT Division of Health Sciences and Technology M +1 917 742 8019 laserson at mit.edu From p.j.a.cock at googlemail.com Mon May 23 22:09:02 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 May 2011 23:09:02 +0100 Subject: [Biopython] reading Alphabet from file In-Reply-To: References: Message-ID: On Monday, May 23, 2011, Uri Laserson wrote: > Hi all, > > I am trying to implement a method that will convert a SeqRecord to a JSON > serializable object. ?One piece of data that must be stored for a Seq object > is the alphabet type. ?When I read this from file, what is the best practice > to reload a the same alphabet type? > > Thanks! > Uri Hmm, that's tricky because the Biopython alphabet haerachy is so complicated. Or richly detailed depending on your point of view ;-) In your position I would apply the KISS principle and reduce it to Protein, DNA, RNA or unknown - and use the generic_protein etc classes on reconstruction. Unless you need more detail than that? Peter From p.j.a.cock at googlemail.com Tue May 24 11:26:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 24 May 2011 12:26:25 +0100 Subject: [Biopython] gff3 problem In-Reply-To: References: <4D9B0A6D.3040608@gmail.com> <20110405132247.GA20523@sobchak> <4D9DB3F4.30107@gmail.com> <20110408121041.GM20963@sobchak> <20110520111535.GC21651@sobchak> Message-ID: On Fri, May 20, 2011 at 12:27 PM, Peter Cock wrote: > On Fri, May 20, 2011 at 12:15 PM, Brad Chapman wrote: >> Peter; >> >> [SeqFeature support for not-stranded elements] >>> So was the consensus that we should reword the Bio.SeqFeature >>> docstring so say the four valid values for strand are (with GFF3 >>> equivalents in brackets): >>> >>> +1 = Forward (+ in GFF3) >>> -1 = Reverse (- in GFF3) >>> 0 = Not stranded (. in GFF3) >>> None = Unknown (? in GFF3) >>> >>> And should features on a protein sequence then have strand 0? >> >> That sounds great. I can make the corresponding change to the >> GFF library. Let me know if there are any other roadblocks to >> integrating that. Thanks much, >> Brad Going over this a fresh now, in my email of 20 May, I had mixed up Leighton's original suggestion. The two special cases (0 and None) are a bit of a pain: http://lists.open-bio.org/pipermail/biopython/2011-April/007194.html Back in April, Leighton wrote: > The obvious (to me) mapping of the four allowed Biopython symbols to the > GFF3 convention is: > +1 -> + > -1 -> - > None -> . > 0 -> ? > because 'None' is semantically close to 'has no strand information of > consequence', and 0 is the mean of +1 and -1 ;) > Cheers, > L. i.e. +1 = Forward (+ in GFF3) -1 = Reverse (- in GFF3) 0 = Stranded but unknown (? in GFF3) None = Not stranded (. in GFF3) SeqFeature docstring updated: https://github.com/biopython/biopython/commit/ea64c74758dccfc7e6c0940e31a214293ecc59d3 This way proteins features should have strand None (which is what the current GenBank/EMBL parser does anyway). Note that the SeqFeature default is strand=None which is still OK. Mixed strand isn't needed in the GFF3 model, but we already use None for this. Perhaps it should be 0 rather than None under this model? Peter From hxcan at stupidbeauty.com Sun May 29 07:18:22 2011 From: hxcan at stupidbeauty.com (=?GB2312?B?ssy78Mqk?=) Date: Sun, 29 May 2011 15:18:22 +0800 Subject: [Biopython] Another warning of "missing dtd file" Message-ID: <4DE1F33E.8020700@stupidbeauty.com> /usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py:495: UserWarning: Unable to load DTD file bookdoc_110101.dtd. Bio.Entrez uses NCBI's DTD files to parse XML files returned by NCBI Entrez. Though most of NCBI's DTD files are included in the Biopython distribution, sometimes you may find that a particular DTD file is missing. While we can access the DTD file through the internet, the parser is much faster if the required DTD files are available locally. For this purpose, please download bookdoc_110101.dtd from http://www.ncbi.nlm.nih.gov/entrez/query/DTD/bookdoc_110101.dtd and save it either in directory /usr/lib/python2.6/site-packages/Bio/Entrez/DTDs or in directory /Data/.biopython/Bio/Entrez/DTDs in order for Bio.Entrez to find it. Alternatively, you can save bookdoc_110101.dtd in the directory Bio/Entrez/DTDs in the Biopython distribution, and reinstall Biopython. Please also inform the Biopython developers about this missing DTD, by reporting a bug on http://bugzilla.open-bio.org/ or sign up to our mailing list and emailing us, so that we can include it with the next release of Biopython. Proceeding to access the DTD file through the internet... warnings.warn(message) From p.j.a.cock at googlemail.com Sun May 29 10:00:58 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 29 May 2011 11:00:58 +0100 Subject: [Biopython] Another warning of "missing dtd file" In-Reply-To: <4DE1F33E.8020700@stupidbeauty.com> References: <4DE1F33E.8020700@stupidbeauty.com> Message-ID: 2011/5/29 ??? : > /usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py:495: UserWarning: > Unable to load DTD file bookdoc_110101.dtd. > ,,, > For this purpose, please download bookdoc_110101.dtd from > > http://www.ncbi.nlm.nih.gov/entrez/query/DTD/bookdoc_110101.dtd > > ... > Please also inform the Biopython developers about this missing DTD, by > reporting a bug on http://bugzilla.open-bio.org/ or sign up to our mailing > list and emailing us, so that we can include it with the next release of > Biopython. Thank you, that's been added. I don't see anything else missing from this list, but I know it is a partial listing: http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/index.shtml Peter From sainitin7 at gmail.com Tue May 31 11:34:54 2011 From: sainitin7 at gmail.com (sai nitin) Date: Tue, 31 May 2011 13:34:54 +0200 Subject: [Biopython] Query regarding Bioassay database Message-ID: Hello, Myself sainitin i have one query regarding Eutilities use for pubchem and bioassay database as follows Question: I have list of pubchem IDs i have to get corresponding bioassay IDS which are unspecified for example it should print as following PubchemID:Bioassay IDs (unspecified) Please can any one give some suggestions how to retreive unspecified Bioassay IDS for given Pubchem IDS using Biopython Thanks in Advance -- Sainitin D From p.j.a.cock at googlemail.com Tue May 31 12:30:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 31 May 2011 13:30:15 +0100 Subject: [Biopython] Query regarding Bioassay database In-Reply-To: References: Message-ID: On Tue, May 31, 2011 at 12:34 PM, sai nitin wrote: > Hello, > > Myself sainitin ?i have one query regarding Eutilities use for pubchem and > bioassay database as follows > > Question: I have list of pubchem IDs i have to get corresponding bioassay > IDS which are unspecified > for example it should print as following > > PubchemID:Bioassay IDs (unspecified) > > Please can any one give some suggestions how to retreive unspecified > Bioassay IDS for given Pubchem IDS using Biopython > > Thanks in Advance Try Entrez Link (ELink), possibly with the pcassay_pccompound link. See the links in the Biopython documentation for Bio.Entrez.ELink, especially: http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/entrezlinks.html If you could give a more complete example it would help. In particular, an example of a positive match between pubchem and bioassay. Peter