From wong at ebgm.jussieu.fr Mon Oct 3 05:21:19 2005 From: wong at ebgm.jussieu.fr (WONG Hua) Date: Mon Oct 3 05:34:08 2005 Subject: [BioPython] Having some issue with unicode and Bio.PDB Message-ID: <20051003092119.GA21209@bach.ebgm.jussieu.fr> I am not too aware of unicode (never heard of it until I bumped on this problem). Lately I have been working with Bio.PDB and Blender (see blender.org) with the idea to build some cool tools. There is no error if I use the genuine python interpreter. But when I use the embedded interpreter inside Blender I have the following error message when importing Bio.PDB: ############################################################## File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/__init__.py", line 22, in ? from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/Polypeptide.py", line 5, in ? from Bio.Seq import Seq File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/Seq.py", line1, in ? import string, array ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so: undefined symbol: PyUnicodeUCS2_FromUnicode ############################################################## I found that Blender interpreter is UCS4 while my local python is UCS2... Both are using python 2.4 I can try and recompile Blender to enable UCS2 but I would prefer to avoid this if I can (I am not an expert in compiling things) because I don't know if it would work afterwards... Is there something I can modify in Bio.PDB's .py files so it can be imported when using UCS4? Thanks, Hua From mdehoon at c2b2.columbia.edu Mon Oct 3 11:58:20 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Mon Oct 3 12:08:44 2005 Subject: [BioPython] Having some issue with unicode and Bio.PDB Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD15@cgcmail.cgc.cpmc.columbia.edu> File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/__init__.py", line 22, in ? from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/Polypeptide.py ", line 5, in ? from Bio.Seq import Seq File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/Seq.py", line1, in ? import string, array ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so: undefined symbol: PyUnicodeUCS2_FromUnicode You are mixing python2.3 and python2.4 (see the paths in the traceout). You should reinstall Biopython with python2.4. From the traceback, it seems that you have Biopython for python2.3 (or, for some reason, python imports python2.3's Biopython instead of python2.4's Biopython). --Michiel. From wong at ebgm.jussieu.fr Tue Oct 4 04:35:44 2005 From: wong at ebgm.jussieu.fr (WONG Hua) Date: Tue Oct 4 04:36:03 2005 Subject: [BioPython] Having some issue with unicode and Bio.PDB In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD15@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD15@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <20051004083544.GA29562@bach.ebgm.jussieu.fr> After your mail, I cleaned up my pythons and re-builded every modules using python 2.4. Now, instead of the previous error, it gives me: ############################################################### Traceback (most recent call last): File "exportPDBv1_0.py", line 6, in ? File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/PDB/__init__.py", line 22, in ? from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/PDB/Polypeptide.py", line 5, in ? from Bio.Seq import Seq File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/Seq.py", line1, in ? import string, array ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so: undefined symbol: _PyArg_NoKeywords ############################################################### It does this when I import Bio.PDB. Not when I import Bio only. On Mon, Oct 03, 2005 at 11:58:20AM -0400, Michiel De Hoon wrote: > File > "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/__init__.py", > line 22, in ? > from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names > File > "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/Polypeptide.py > ", > line 5, in ? > from Bio.Seq import Seq > File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/Seq.py", > line1, in ? > import string, array > ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so: > undefined symbol: PyUnicodeUCS2_FromUnicode > > > You are mixing python2.3 and python2.4 (see the paths in the traceout). You > should reinstall Biopython with python2.4. From the traceback, it seems that > you have Biopython for python2.3 (or, for some reason, python imports > python2.3's Biopython instead of python2.4's Biopython). > > --Michiel. From mdehoon at c2b2.columbia.edu Tue Oct 4 11:31:13 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue Oct 4 11:36:30 2005 Subject: [BioPython] Having some issue with unicode and Bio.PDB Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD1A@cgcmail.cgc.cpmc.columbia.edu> To find out if this is a problem with Biopython, or a problem with Python itself, can you try: >>> import array without Biopython? My guess is that you will find the same error: ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so: undefined symbol: _PyArg_NoKeywords --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: WONG Hua [mailto:wong@ebgm.jussieu.fr] Sent: Tue 10/4/2005 4:35 AM To: Michiel De Hoon; biopython@biopython.org Subject: Re: [BioPython] Having some issue with unicode and Bio.PDB After your mail, I cleaned up my pythons and re-builded every modules using python 2.4. Now, instead of the previous error, it gives me: ############################################################### Traceback (most recent call last): File "exportPDBv1_0.py", line 6, in ? File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/PDB/__init__.py", line 22, in ? from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/PDB/Polypeptide.py ", line 5, in ? from Bio.Seq import Seq File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/Seq.py", line1, in ? import string, array ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so: undefined symbol: _PyArg_NoKeywords ############################################################### It does this when I import Bio.PDB. Not when I import Bio only. On Mon, Oct 03, 2005 at 11:58:20AM -0400, Michiel De Hoon wrote: > File > "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/__init__.py", > line 22, in ? > from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names > File > "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/Polypeptide.py > ", > line 5, in ? > from Bio.Seq import Seq > File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/Seq.py", > line1, in ? > import string, array > ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so: > undefined symbol: PyUnicodeUCS2_FromUnicode > > > You are mixing python2.3 and python2.4 (see the paths in the traceout). You > should reinstall Biopython with python2.4. From the traceback, it seems that > you have Biopython for python2.3 (or, for some reason, python imports > python2.3's Biopython instead of python2.4's Biopython). > > --Michiel. From borreguero at gmail.com Tue Oct 4 14:33:14 2005 From: borreguero at gmail.com (Jose Borreguero) Date: Tue Oct 4 15:34:45 2005 Subject: [BioPython] how to add new methods to class Seq ? Message-ID: <7cced4ed0510041133i686eb23u@mail.gmail.com> Hi all, I'm new to biopython (and python!), and wondering the efficient way to add methods of my own to class Seq for my in-house computing. Should I (1) open the code and start ading add methods; (2) instantiate a derived class from Seq and write there my methods; (3) what else? jose From frederic.sohm at iaf.cnrs-gif.fr Wed Oct 5 02:40:03 2005 From: frederic.sohm at iaf.cnrs-gif.fr (Frederic Sohm) Date: Wed Oct 5 03:40:12 2005 Subject: [BioPython] how to add new methods to class Seq ? In-Reply-To: <7cced4ed0510041133i686eb23u@mail.gmail.com> References: <7cced4ed0510041133i686eb23u@mail.gmail.com> Message-ID: <200510050840.03562.frederic.sohm@iaf.cnrs-gif.fr> Hi The best practice would commend to write a new class inheriting from Seq. Here is an example : >>> from Bio.Seq import Seq >>> class myseq(Seq) : def tolist(self) : return list(self.tostring()) >>> s = myseq('ACGT') >>> s.tolist() ['A', 'C', 'G', 'T'] >>> dir(Seq) ['_Seq__maketrans', '__add__', '__doc__', '__getitem__', '__getslice__', '__init__', '__len__', '__module__', '__radd__', '__repr__', '__str__', 'complement', 'count', 'reverse_complement', 'tomutable', 'tostring'] >>> dir(myseq) ['_Seq__maketrans', '__add__', '__doc__', '__getitem__', '__getslice__', '__init__', '__len__', '__module__', '__radd__', '__repr__', '__str__', 'complement', 'count', 'reverse_complement', 'tolist', 'tomutable', 'tostring'] >>> Le Mardi 4 Octobre 2005 20:33, Jose Borreguero a ?crit?: > Hi all, > I'm new to biopython (and python!), and wondering the efficient way to add > methods of my own to class Seq for my in-house computing. Should I (1) open > the code and start ading add methods; (2) instantiate a derived class from > Seq and write there my methods; (3) what else? > jose > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Fr?d?ric Sohm Equipe INRA U1126 "Morphogen?se du syst?me nerveux des Chord?s" UPR 2197 DEPSN, CNRS Institut de Neurosciences A. Fessard 1 Avenue de la Terrasse 91 198 GIF-SUR-YVETTE FRANCE Phone: +33 (0) 1 69 82 34 12 Fax:+33 (0) 1 69 82 34 47 From wong at ebgm.jussieu.fr Wed Oct 5 03:59:38 2005 From: wong at ebgm.jussieu.fr (WONG Hua) Date: Wed Oct 5 03:59:10 2005 Subject: [BioPython] Having some issue with unicode and Bio.PDB In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD1A@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD1A@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <20051005075938.GA4714@bach.ebgm.jussieu.fr> Thanks for the suggestion. At least, I see where the problem really is now. Michiel De Hoon wrote: > To find out if this is a problem with Biopython, or a problem with Python > itself, can you try: > >>> import array > without Biopython? My guess is that you will find the same error: > > ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so: > undefined symbol: _PyArg_NoKeywords > > --Michiel. > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > From 2huggie at gmail.com Wed Oct 5 04:55:52 2005 From: 2huggie at gmail.com (Timothy Wu) Date: Wed Oct 5 05:22:48 2005 Subject: [BioPython] TMHMM Message-ID: Hi, Two things; 1. Is there a TMHMM module (and hopfully documented) in BioPython? 2. I had posted a similar question to the Python mailing list a long time ago, but unfortunately I had not get a satisfiable answer. I hope I can find some help here. >From viewing TMHMM html source (http://www.cbs.dtu.dk/services/TMHMM/) I think the form fields are: seqfile --> file, which I don't know how to use, so I will not use it. SEQ --> text box, which should be aa sequences outform --> radio buttons, valid values are '-noshort', '-noplot', '-short'. I would like to have it as '-short' version --> a check box, valid value is '-v1' I tested using urllib with something like this: ----------------------------------------- params = urllib.urlencode({ 'configfile':'/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf', 'SEQ':'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV', 'version': '-short' }) f = urllib.urlopen("http://www.cbs.dtu.dk/cgi-bin/nph-webface", params) sec = f.read() ----------------------------------------- Of course, a successful TMHMM query requires me to read the return page from urlopen, and parse another url from within to obtain the final result page. But this is only the code to obtain the intermediate page. My problem is, notice my above strange code. For the 'version' field, I need to fill in either '-v1' or leave it empty. For outform (which I didn't include), I should have '-short'. But that doesn't work! Had I done that it would return "Read: Field not declared; 'outform'". I can get it to work if I only fill in "configfile" and "SEQ" fields. But I would get the equivalent of having 'noshort' value in 'outform' (extensive with graphics). The only way I am going to get the '-short' effect (one line per protein) is to fill in '-short' in the version field. This is very bizarre. Is there something I am missing? SignalP which also is provided by the same script also show this bizarre nature. I have also tried a perl script to do the same thing. Strange behavior also result but not like the Python one. my $agent = new LWP::UserAgent; my $request = POST($GS_URL, Content_Type => 'form-data', Content => [ configfile => '/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf', SEQ => 'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV', outform => '-short', ] ); From 2huggie at gmail.com Wed Oct 5 05:01:36 2005 From: 2huggie at gmail.com (Timothy Wu) Date: Wed Oct 5 16:31:28 2005 Subject: [BioPython] TMHMM Message-ID: (sorry for the previous post, I accidently sent it before I finish it) Hi, Two things; 1. Is there a TMHMM module (and hopfully documented) in BioPython? 2. I had posted a similar question to the Python mailing list a long time ago, but unfortunately I had not get a satisfiable answer. I hope I can find some help here. >From viewing TMHMM html source (http://www.cbs.dtu.dk/services/TMHMM/) I think the form fields are: seqfile --> file, which I don't know how to use, so I will not use it. SEQ --> text box, which should be aa sequences outform --> radio buttons, valid values are '-noshort', '-noplot', '-short'. I would like to have it as '-short' version --> a check box, valid value is '-v1' I tested using urllib with something like this: ----------------------------------------- params = urllib.urlencode({ 'configfile':'/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf', 'SEQ':'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV', 'version': '-short' }) f = urllib.urlopen("http://www.cbs.dtu.dk/cgi-bin/nph-webface", params) sec = f.read() ----------------------------------------- Of course, a successful TMHMM query requires me to read the return page from urlopen, and parse another url from within to obtain the final result page. But this is only the code to obtain the intermediate page. My problem is, notice my above strange code. For the 'version' field, I need to fill in either '-v1' or leave it empty. For outform (which I didn't include), I should have '-short'. But that doesn't work! Had I done that it would return "Read: Field not declared; 'outform'". I can get it to work if I only fill in "configfile" and "SEQ" fields. But I would get the equivalent of having 'noshort' value in 'outform' (extensive with graphics). The only way I am going to get the '-short' effect (one line per protein) is to fill in '-short' in the version field. This is very bizarre. Is there something I am missing? SignalP which also is provided by the same script also show this bizarre nature. I have also tried a perl script to do the same thing. Strange behavior also result but not like the Python one: ----------------------------------------- my $agent = new LWP::UserAgent; my $request = POST($GS_URL, Content_Type => 'form-data', Content => [ configfile => '/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf', SEQ => 'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV', outform => '-short', ] ); ----------------------------------------- The script works. But if I swap the lines between configfile and seq parameters, it doesn't. I am not really familier with Perl so I have no idea why this is the way it is. Anyone can tell me why I had to fill in '-noshort' in the supposingly wrong field 'version'? Thanks a bunch. Timothy From 2huggie at gmail.com Sat Oct 8 08:21:28 2005 From: 2huggie at gmail.com (Timothy Wu) Date: Sat Oct 8 08:28:42 2005 Subject: [BioPython] Re: TMHMM In-Reply-To: References: Message-ID: On 10/5/05, Timothy Wu <2huggie@gmail.com> wrote: > > (sorry for the previous post, I accidently sent it before I finish it) > > Hi, > > Two things; > > 1. Is there a TMHMM module (and hopfully documented) in BioPython? > 2. I had posted a similar question to the Python mailing list a long time > ago, but unfortunately I had not get a satisfiable answer. I hope I can find > some help here. > > From viewing TMHMM html source (http://www.cbs.dtu.dk/services/TMHMM/) I > think the form fields are: > > seqfile --> file, which I don't know how to use, so I will not use it. > SEQ --> text box, which should be aa sequences > outform --> radio buttons, valid values are '-noshort', '-noplot', > '-short'. I would like to have it as '-short' > version --> a check box, valid value is '-v1' > > I tested using urllib with something like this: > > ----------------------------------------- > params = urllib.urlencode({ > 'configfile':'/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf', > 'SEQ':'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV', > 'version': '-short' > }) > > f = urllib.urlopen("http://www.cbs.dtu.dk/cgi-bin/nph-webface", params) > sec = f.read() > ----------------------------------------- > > Of course, a successful TMHMM query requires me to read the return page > from urlopen, and parse another url > from within to obtain the final result page. But this is only the code to > obtain the intermediate page. > > My problem is, notice my above strange code. For the 'version' field, I > need to fill in either '-v1' or leave it empty. > For outform (which I didn't include), I should have '-short'. But that > doesn't work! Had I done that it would return "Read: Field not declared; > 'outform'". > > I can get it to work if I only fill in "configfile" and "SEQ" fields. But > I would get the equivalent of having 'noshort' value in 'outform' (extensive > with graphics). The only way I am going to get the '-short' effect (one line > per protein) is to fill in '-short' in the version field. This is very > bizarre. > > Is there something I am missing? SignalP which also is provided by the > same script also show this bizarre nature. I have also tried a perl script > to do the same thing. Strange behavior also result but not like the Python > one: > > ----------------------------------------- > my $agent = new LWP::UserAgent; > my $request = POST($GS_URL, > Content_Type => 'form-data', > Content => [ > configfile => '/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf', > SEQ => 'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV', > outform => '-short', > ] > ); > ----------------------------------------- > > The script works. But if I swap the lines between configfile and seq > parameters, it doesn't. I am not really familier with Perl so I have no idea > why this is the way it is. > Anyone can tell me why I had to fill in '-noshort' in the supposingly > wrong field 'version'? Thanks a bunch. > > Timothy > I mailed the Center for Biological Sequence Analysis (which host the TMHMM web service) and the response I got was to put the configfile field as the first key value pair. So instead of a dictionary I sent a list of tuples to the urlencode() instead. This solved all problems. The version=-short works because it was put on a commandline as a commandline switch. So it works regardless. Timothy From omid9dr18 at hotmail.com Wed Oct 12 16:47:27 2005 From: omid9dr18 at hotmail.com (Omid Khalouei) Date: Wed Oct 12 17:03:43 2005 Subject: [BioPython] Structure Alignment Message-ID: Hello, Does biopython have any functions for structural alignment? I need to align some protein stuctures and measure the RMSD. I know how to do this with programs such as Swiss-PDBViewer but I need to do this on a dozen structures that's why I need to automate it. Thanks for your help. S. Khalouei From mmokrejs at ribosome.natur.cuni.cz Fri Oct 14 17:12:27 2005 From: mmokrejs at ribosome.natur.cuni.cz (=?ISO-8859-2?Q?Martin_MOKREJ=A9?=) Date: Fri Oct 14 17:18:58 2005 Subject: [BioPython] genbank parser returns start position of the location decreased by one Message-ID: <43501F3B.2000902@ribosome.natur.cuni.cz> Hi, I am either too tired or have missed some point. I use bipython 1.40b to fetch data from genbank. The location: (467..2863) from Genbank as seen on their web pages differs to the string returned by biopython. I get location: (466..2863) instead. The latter number is never decreased, only the first-one. What's wrong? ;) http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=56117851 It happends with CDS feature data, but also with source, just anything: FEATURES Location/Qualifiers source 1..4115 $ python Python 2.4.2 (#1, Oct 2 2005, 05:43:55) [GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import GenBank >>> record_parser = GenBank.FeatureParser() >>> ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = record_parser) >>> gb_seqrecord = ncbi_dict['56117851'] >>> print _feature.location (0..4115) >>> From mdehoon at c2b2.columbia.edu Fri Oct 14 18:05:10 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri Oct 14 18:12:40 2005 Subject: [BioPython] genbank parser returns start position of the location decreased by one Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD4B@cgcmail.cgc.cpmc.columbia.edu> I would think that this is intentional. Python uses zero-based arrays, Genbank starts counting at 1. In other words, gb_seqrecord.seq[0:4115] will return the sequence that you're interested in. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces@portal.open-bio.org on behalf of Martin MOKREJS Sent: Fri 10/14/2005 5:12 PM To: biopython@biopython.org Subject: [BioPython] genbank parser returns start position of the location decreased by one Hi, I am either too tired or have missed some point. I use bipython 1.40b to fetch data from genbank. The location: (467..2863) from Genbank as seen on their web pages differs to the string returned by biopython. I get location: (466..2863) instead. The latter number is never decreased, only the first-one. What's wrong? ;) http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=56117851 It happends with CDS feature data, but also with source, just anything: FEATURES Location/Qualifiers source 1..4115 $ python Python 2.4.2 (#1, Oct 2 2005, 05:43:55) [GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import GenBank >>> record_parser = GenBank.FeatureParser() >>> ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = record_parser) >>> gb_seqrecord = ncbi_dict['56117851'] >>> print _feature.location (0..4115) >>> _______________________________________________ BioPython mailing list - BioPython@biopython.org http://biopython.org/mailman/listinfo/biopython From mmokrejs at ribosome.natur.cuni.cz Mon Oct 17 08:24:23 2005 From: mmokrejs at ribosome.natur.cuni.cz (=?windows-1252?Q?Martin_MOKREJ=8A?=) Date: Mon Oct 17 08:23:31 2005 Subject: [BioPython] genbank parser returns start position of the location decreased by one In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD4B@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD4B@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <435397F7.5030104@ribosome.natur.cuni.cz> Hi Michiel, I thought this might be the "feature" of this, but imagine you just parse genbank data and display on the web web. I don't believe anybode would expect that the data fetched through biopython differ from those visible on the NCBI web. Is this "feature" at least consistent with bioperl's behavior? M. Michiel De Hoon wrote: > I would think that this is intentional. Python uses zero-based arrays, > Genbank starts counting at 1. > In other words, gb_seqrecord.seq[0:4115] will return the sequence that you're > interested in. From biopyte at yahoo.de Sat Oct 22 06:43:12 2005 From: biopyte at yahoo.de (Hans Meier) Date: Sat Oct 22 06:49:06 2005 Subject: [BioPython] restrichtion map like "remap" Message-ID: <20051022104312.70952.qmail@web26310.mail.ukl.yahoo.com> Dear friends, there's a feature in Biopython I can't find or it is missing. It's a function like "remap" from emboss or "map" from gcg. So you input a sequence in raw- or fasta-format and you get a map of your sequence including (optionally) restriction sites, translation in all six frames ... You can also format the output by telling how many bases per line, marginwitdth etc. etc. Is there nothing similiar within Biopython? Any other suggestions (except using Emboss :) Thanks a lot, Harald ___________________________________________________________ Gesendet von Yahoo! Mail - Jetzt mit 1GB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de From frederic.sohm at iaf.cnrs-gif.fr Mon Oct 24 03:15:54 2005 From: frederic.sohm at iaf.cnrs-gif.fr (Frederic Sohm) Date: Mon Oct 24 03:37:02 2005 Subject: [BioPython] restrichtion map like "remap" In-Reply-To: <20051022104312.70952.qmail@web26310.mail.ukl.yahoo.com> References: <20051022104312.70952.qmail@web26310.mail.ukl.yahoo.com> Message-ID: <200510240915.55248.frederic.sohm@iaf.cnrs-gif.fr> Hi, I don't know for the emboss support but for Restriction map you can use the module Restriction : Python 2.4.2 (#1, Sep 28 2005, 17:53:13) [GCC 3.4.3 (Mandrakelinux 10.2 3.4.3-7mdk)] on linux2 Type "copyright", "credits" or "license()" for more information. **************************************************************** Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. **************************************************************** IDLE 1.1.2 >>> from Bio.Restriction import Analysis, AllEnzymes, CommOnly >>> from Bio.Seq import Seq >>> pbr = Seq('TTCT --- cut pBR322 sequence ---') >>> restmap = Analysis(AllEnzymes, pbr, False) >>> # False means the sequence is circular >>> # Nothing or True the sequence is linear. >>> restmap.print_that(None, 'restriction map of pBR322 \n\n') restriction map of pBR322 AccII : 348, 704, 819, 948, 975, 980, 1041, 1107, 1236, 1246, 1391, 1417, 1539, 1636, 2006, 2075, 2180, 2521, 3102, 3432, 3925, 4257. --- cut all the enzymes and their sites --- Enzymes which do not cut the sequence. AatI Acc65I AcvI AflII AgeI AhlI ApaI AsiAI --- cut the enzymes absent from the sequence --- >>> # to use only commercially available enzymes do that : >>> restmap = Analysis(CommOnly, pbr, False) --- results cut --- You can format the results as map by doing : >>> restmap.print_as('map') >>> restmap.print_that() ... Hope that helps. unfortunately there is no support for translation frames with this module. For more details on how to use it see the manual provided in the doc, cookbook style : here : http://www.biopython.org/docs/cookbook/Restriction.html best regards Fred Le Samedi 22 Octobre 2005 12:43, Hans Meier a ?crit?: > Dear friends, > > there's a feature in Biopython I can't find > or it is missing. > > It's a function like "remap" from emboss > or "map" from gcg. So you input a sequence > in raw- or fasta-format and you get a map > of your sequence including (optionally) > restriction sites, translation in all six frames ... > You can also format the output by telling > how many bases per line, marginwitdth etc. etc. > > Is there nothing similiar within Biopython? > Any other suggestions (except using Emboss :) > > > Thanks a lot, Harald > > > > > > > > > > > > > > ___________________________________________________________ > Gesendet von Yahoo! Mail - Jetzt mit 1GB Speicher kostenlos - Hier > anmelden: http://mail.yahoo.de > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Fr?d?ric Sohm Equipe INRA U1126 "Morphogen?se du syst?me nerveux des Chord?s" UPR 2197 DEPSN, CNRS Institut de Neurosciences A. Fessard 1 Avenue de la Terrasse 91 198 GIF-SUR-YVETTE FRANCE Phone: +33 (0) 1 69 82 34 12 Fax:+33 (0) 1 69 82 34 47 From boris.steipe at utoronto.ca Thu Oct 27 14:32:50 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu Oct 27 14:37:03 2005 Subject: [BioPython] Fwd: Please take the Gene Ontology survey References: Message-ID: My apologies in case this reaches anyone more than once. Context: the GO grant is up for competitive renewal in the new year, and the volume of responses to this survey will help GO demonstrate the degree to which it has been adopted by the community. And, as you know, government support for computational biology infrastructure has been insufficient in the recent past. Boris Begin forwarded message: > From: Jane Lomax > Date: 27 October 2005 13:20:14 GMT-04:00 > To: boris.steipe@utoronto.ca > Subject: Please take the Gene Ontology survey > > > Hello, > > The Gene Ontology (GO) is a system for functional annotation of > genes and > gene products. It enables classification of gene products according to > molecular function, biological process, and cellular location of > action. > > Please help us by taking part in our survey. > > The results of this survey will help us improve our services > to our user community, and help direct our resources more effectively. > > It's a very straightforward set of questions, which should take a > maximum > of 10 minutes to complete. There's no requirement to submit your name > or email address. To complete the survey, go to: > > http://www.AdvancedSurvey.com/default.asp?SurveyID=32355 > > Please pass on to any friends or collegues not on these lists. > > Many thanks for your time, > > The GO Consortium > > > > > > > > > > > From gabraham at cs.rmit.edu.au Fri Oct 28 10:52:50 2005 From: gabraham at cs.rmit.edu.au (Gad Abraham) Date: Fri Oct 28 11:24:32 2005 Subject: [BioPython] Extracting residue list from PDB Message-ID: <20051028145250.GA22523@cs.rmit.edu.au> Hi, I'm trying to extract a FASTA-like list of residues from a PDB file. It doesn't seem to work correctly for some (e.g. 1n62, which comes out as 10 chains while it only has 6, and chain lengths are wrong too). I'm using the following script based on the Structural Biopython FAQ: #!/usr/bin/python from Bio.PDB import * import sys parser = PDBParser() structure = parser.get_structure(sys.argv[1], sys.argv[1]) ppb = PPBuilder() for pp in ppb.build_peptides(structure): print len(pp),pp.get_sequence().tostring() print Any tips would be appreciated. Thanks, Gad -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ gabraham@cs.rmit.edu.au http://yallara.cs.rmit.edu.au/~gabraham +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From idoerg at burnham.org Fri Oct 28 12:07:53 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Fri Oct 28 12:16:06 2005 Subject: [BioPython] Extracting residue list from PDB In-Reply-To: <20051028145250.GA22523@cs.rmit.edu.au> References: <20051028145250.GA22523@cs.rmit.edu.au> Message-ID: <43624CD9.5070009@burnham.org> Hi Gad, The reason the chains seem to be of the wrong length, is that they are generated from the ATOM records, rather than the SEQRES records. Those disagree often enough in PDB files. This problem does not exists in mmCIF, I believe. Using your code, I got only six chains, so I cannot comment on the second problem. Best, ./I Gad Abraham wrote: >Hi, > >I'm trying to extract a FASTA-like list of residues from a PDB file. It >doesn't seem to work correctly for some (e.g. 1n62, which comes out as >10 chains while it only has 6, and chain lengths are wrong too). > >I'm using the following script based on the Structural Biopython FAQ: > >#!/usr/bin/python > >from Bio.PDB import * >import sys > >parser = PDBParser() >structure = parser.get_structure(sys.argv[1], sys.argv[1]) > >ppb = PPBuilder() >for pp in ppb.build_peptides(structure): > print len(pp),pp.get_sequence().tostring() > print > > >Any tips would be appreciated. > >Thanks, >Gad > > -- Iddo Friedberg, Ph.D. Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9949 http://ffas.ljcrf.edu/~iddo From mdehoon at c2b2.columbia.edu Fri Oct 28 20:14:54 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri Oct 28 20:20:37 2005 Subject: [BioPython] Biopython release 1.41 Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD8B@cgcmail.cgc.cpmc.columbia.edu> Dear biopythoneers, We are pleased to announce the release of Biopython 1.41. Many improvements were made in Biopython during the eight months since the previous release, and the new release contains lots of bugfixes, improvements, new functionalities, and better documentation. To pick a few, there's the new Bio.MEME module by Jason Hackney, updates to the Blast parser using Bertrand Frottier's NCBIXML code, a BLAT parser by Yair Benita, numerous updates in Bio.PDB, CompareACE support in AlignAce, and improved user-friendliness in Bio.Seq. Lots of people of contributed to this release, in particular Frank Kauff (Bio.Nexus), Jason Hackney (Bio.MEME), Thomas Hamelryck (Bio.PDB), Fr?d?ric Sohm (Bio.Restriction), James Casbon (Bio.SCOP) for bug fixes and updates, Peter (Bio.Blast.NCBIXML test cases), and of course Jeff Chang, Brad Chapman, Andrew Dalke, and Iddo Friedman for Biopython and the fool-proof instructions on how to roll a release, which made this a lot easier than I anticipated. My apologies if I forgot to thank somebody. --Michiel Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From gabraham at cs.rmit.edu.au Fri Oct 28 22:14:27 2005 From: gabraham at cs.rmit.edu.au (Gad Abraham) Date: Fri Oct 28 22:16:11 2005 Subject: [BioPython] Extracting residue list from PDB In-Reply-To: <43624CD9.5070009@burnham.org> References: <20051028145250.GA22523@cs.rmit.edu.au> <43624CD9.5070009@burnham.org> Message-ID: <20051029021427.GA17453@cs.rmit.edu.au> On Fri, Oct 28, 2005 at 09:07:53AM -0700, Iddo Friedberg wrote: > Hi Gad, > > The reason the chains seem to be of the wrong length, is that they are > generated from the ATOM records, rather than the SEQRES records. Those > disagree often enough in PDB files. > This problem does not exists in mmCIF, I believe. > > Using your code, I got only six chains, so I cannot comment on the > second problem. > I see. I'm consistently getting 10 chains (lengths 161, 804, 97, 176, 11, 71, 87, 662, 133, 286) for 1N62. It seems that parsing the SEQRES is the way go. Thanks, Gad -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ gabraham@cs.rmit.edu.au http://yallara.cs.rmit.edu.au/~gabraham +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From boris.steipe at utoronto.ca Sat Oct 29 00:45:38 2005 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Sat Oct 29 03:32:47 2005 Subject: [BioPython] Extracting residue list from PDB In-Reply-To: <20051029021427.GA17453@cs.rmit.edu.au> References: <20051028145250.GA22523@cs.rmit.edu.au> <43624CD9.5070009@burnham.org> <20051029021427.GA17453@cs.rmit.edu.au> Message-ID: SEQRES and ATOM records have different semantics: the SEQRES is what the crystallographer puts into the experiment, the ATOM records is what she sees. Presumably you have covalent chain-breaks between residues, or parts of the polypeptide chain were not traceable in electron density and were omitted. So even though they numbers are inconsistent, they are both right. A related issue may be what the natural protein sequence is, as opposed to the perhaps truncated molecule in the crystal (presumably you are not interested in the propensity of fragments to crystallize, but in some biological property), or what the translated sequence is, that may have been posttranslationally processed, or even what the gene sequence is, that may have been translated with e.g. selenocysteine, etc. etc. So, (as usual) "the way to go" is determined by where you want to go to. HTH Boris On 28 Oct 2005, at 22:14, Gad Abraham wrote: > On Fri, Oct 28, 2005 at 09:07:53AM -0700, Iddo Friedberg wrote: > > > >> Hi Gad, >> >> The reason the chains seem to be of the wrong length, is that they >> are >> generated from the ATOM records, rather than the SEQRES records. >> Those >> disagree often enough in PDB files. >> This problem does not exists in mmCIF, I believe. >> >> Using your code, I got only six chains, so I cannot comment on the >> second problem. >> >> >> >> > > I see. I'm consistently getting 10 chains (lengths 161, 804, 97, 176, > 11, 71, 87, 662, 133, 286) for 1N62. > > It seems that parsing the SEQRES is the way go. > > Thanks, > Gad > > -- > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > gabraham@cs.rmit.edu.au > http://yallara.cs.rmit.edu.au/~gabraham > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > > From gabraham at cs.rmit.edu.au Mon Oct 31 04:05:30 2005 From: gabraham at cs.rmit.edu.au (Gad Abraham) Date: Tue Nov 1 16:58:48 2005 Subject: [BioPython] Extracting residue list from PDB In-Reply-To: References: <20051028145250.GA22523@cs.rmit.edu.au> <43624CD9.5070009@burnham.org> <20051029021427.GA17453@cs.rmit.edu.au> Message-ID: <20051031090530.GB666@cs.rmit.edu.au> On Sat, Oct 29, 2005 at 12:45:38AM -0400, Boris Steipe wrote: > SEQRES and ATOM records have different semantics: the SEQRES is what > the crystallographer puts into the experiment, the ATOM records is > what she sees. Presumably you have covalent chain-breaks between > residues, or parts of the polypeptide chain were not traceable in > electron density and were omitted. > > So even though they numbers are inconsistent, they are both right. > > A related issue may be what the natural protein sequence is, as > opposed to the perhaps truncated molecule in the crystal (presumably > you are not interested in the propensity of fragments to crystallize, > but in some biological property), or what the translated sequence is, > that may have been posttranslationally processed, or even what the > gene sequence is, that may have been translated with e.g. > selenocysteine, etc. etc. > > So, (as usual) "the way to go" is determined by where you want to go to. Like everyone else, I'm trying to predict structure from sequence. So I'm interested in the true sequence of the chain, and what the corresponding tertiary structure is. So it seems to me that the SEQRES entries are the more correct sequence to use, because the ATOM entries are a function of the latter (convoluted through experiments), not the other way round. Thanks, Gad -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ gabraham@cs.rmit.edu.au http://yallara.cs.rmit.edu.au/~gabraham +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From gabraham at cs.rmit.edu.au Mon Oct 31 04:00:52 2005 From: gabraham at cs.rmit.edu.au (Gad Abraham) Date: Tue Nov 1 16:58:57 2005 Subject: [BioPython] Extracting residue list from PDB In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD92@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD92@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <20051031090052.GA666@cs.rmit.edu.au> On Sun, Oct 30, 2005 at 03:44:17PM -0500, Michiel De Hoon wrote: > > On Fri, Oct 28, 2005 at 09:07:53AM -0700, Iddo Friedberg wrote: > > > Using your code, I got only six chains, so I cannot comment on the > > > second problem. > > > > > > I see. I'm consistently getting 10 chains (lengths 161, 804, 97, 176, > > 11, 71, 87, 662, 133, 286) for 1N62. > > Maybe you are using an older version of Biopython? I'm using biopython 1.30, which is the latest in Ubuntu Breezy Linux. Gad -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ gabraham@cs.rmit.edu.au http://yallara.cs.rmit.edu.au/~gabraham +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From thamelry at binf.ku.dk Sat Oct 29 17:26:32 2005 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Tue Nov 1 17:53:10 2005 Subject: [BioPython] Extracting residue list from PDB In-Reply-To: References: <20051028145250.GA22523@cs.rmit.edu.au> <43624CD9.5070009@burnham.org> <20051029021427.GA17453@cs.rmit.edu.au> Message-ID: <3019.193.110.248.8.1130621192.squirrel@www.binf.ku.dk> > SEQRES and ATOM records have different semantics: the SEQRES is what > the crystallographer puts into the experiment, the ATOM records is > what she sees. Presumably you have covalent chain-breaks between > residues, or parts of the polypeptide chain were not traceable in > electron density and were omitted. > > So even though they numbers are inconsistent, they are both right. That's correct. The discussed code locates all the connected peptide fragments in the structure and returns their sequences. Connectivity is evaluated by looking for proper peptide bonds between consecutive residues. If you want to get the sequence of the whole protein you can take a look at the SEQRES record or use some kind of database info. Cheers, -Thomas