From wong at ebgm.jussieu.fr  Mon Oct  3 05:21:19 2005
From: wong at ebgm.jussieu.fr (WONG Hua)
Date: Mon Oct  3 05:34:08 2005
Subject: [BioPython] Having some issue with unicode and Bio.PDB
Message-ID: <20051003092119.GA21209@bach.ebgm.jussieu.fr>

I am not too aware of unicode (never heard of it until I bumped on this
problem).
Lately I have been working with Bio.PDB and Blender (see blender.org) with the
idea to build some cool tools.

There is no error if I use the genuine python interpreter. But when I use the
embedded interpreter inside Blender I have the following error message when
importing Bio.PDB:
##############################################################
  File
"/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/__init__.py",
line 22, in ?
    from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names
  File
"/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/Polypeptide.py",
line 5, in ?
    from Bio.Seq import Seq
  File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/Seq.py",
line1, in ?
    import string, array
ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so:
undefined symbol: PyUnicodeUCS2_FromUnicode
##############################################################

I found that Blender interpreter is UCS4 while my local python is UCS2...
Both are using python 2.4

I can try and recompile Blender to enable UCS2 but I would prefer to avoid this
if I can (I am not an expert in compiling things) because I don't know if it
would work afterwards...
Is there something I can modify in Bio.PDB's .py files so it can be imported
when using UCS4?

Thanks,

Hua
From mdehoon at c2b2.columbia.edu  Mon Oct  3 11:58:20 2005
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Mon Oct  3 12:08:44 2005
Subject: [BioPython] Having some issue with unicode and Bio.PDB
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD15@cgcmail.cgc.cpmc.columbia.edu>

  File
"/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/__init__.py",
line 22, in ?
    from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names
  File
"/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/Polypeptide.py
",
line 5, in ?
    from Bio.Seq import Seq
  File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/Seq.py",
line1, in ?
    import string, array
ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so:
undefined symbol: PyUnicodeUCS2_FromUnicode


You are mixing python2.3 and python2.4 (see the paths in the traceout). You
should reinstall Biopython with python2.4. From the traceback, it seems that
you have Biopython for python2.3 (or, for some reason, python imports
python2.3's Biopython instead of python2.4's Biopython).

--Michiel.

From wong at ebgm.jussieu.fr  Tue Oct  4 04:35:44 2005
From: wong at ebgm.jussieu.fr (WONG Hua)
Date: Tue Oct  4 04:36:03 2005
Subject: [BioPython] Having some issue with unicode and Bio.PDB
In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD15@cgcmail.cgc.cpmc.columbia.edu>
References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD15@cgcmail.cgc.cpmc.columbia.edu>
Message-ID: <20051004083544.GA29562@bach.ebgm.jussieu.fr>

After your mail, I cleaned up my pythons and re-builded every modules using python 2.4.

Now, instead of the previous error, it gives me:
###############################################################
Traceback (most recent call last):
  File "exportPDBv1_0.py", line 6, in ?
  File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/PDB/__init__.py", line 22, in ?
    from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names
  File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/PDB/Polypeptide.py", line 5, in ?
    from Bio.Seq import Seq
  File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/Seq.py", line1, in ?
    import string, array
ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so: undefined symbol: _PyArg_NoKeywords
###############################################################

It does this when I import Bio.PDB. Not when I import Bio only.


On Mon, Oct 03, 2005 at 11:58:20AM -0400, Michiel De Hoon wrote:
>   File
> "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/__init__.py",
> line 22, in ?
>     from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names
>   File
> "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/Polypeptide.py
> ",
> line 5, in ?
>     from Bio.Seq import Seq
>   File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/Seq.py",
> line1, in ?
>     import string, array
> ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so:
> undefined symbol: PyUnicodeUCS2_FromUnicode
> 
> 
> You are mixing python2.3 and python2.4 (see the paths in the traceout). You
> should reinstall Biopython with python2.4. From the traceback, it seems that
> you have Biopython for python2.3 (or, for some reason, python imports
> python2.3's Biopython instead of python2.4's Biopython).
> 
> --Michiel.
From mdehoon at c2b2.columbia.edu  Tue Oct  4 11:31:13 2005
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue Oct  4 11:36:30 2005
Subject: [BioPython] Having some issue with unicode and Bio.PDB
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD1A@cgcmail.cgc.cpmc.columbia.edu>

To find out if this is a problem with Biopython, or a problem with Python
itself, can you try:
>>> import array
without Biopython? My guess is that you will find the same error:

ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so:
undefined symbol: _PyArg_NoKeywords

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: WONG Hua [mailto:wong@ebgm.jussieu.fr]
Sent: Tue 10/4/2005 4:35 AM
To: Michiel De Hoon; biopython@biopython.org
Subject: Re: [BioPython] Having some issue with unicode and Bio.PDB
 
After your mail, I cleaned up my pythons and re-builded every modules using
python 2.4.

Now, instead of the previous error, it gives me:
###############################################################
Traceback (most recent call last):
  File "exportPDBv1_0.py", line 6, in ?
  File
"/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/PDB/__init__.py",
line 22, in ?
    from Polypeptide import PPBuilder, CaPPBuilder, is_aa, standard_aa_names
  File
"/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/PDB/Polypeptide.py
", line 5, in ?
    from Bio.Seq import Seq
  File "/users2/invites/wong/apps/lib/python2.4/site-packages/Bio/Seq.py",
line1, in ?
    import string, array
ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so:
undefined symbol: _PyArg_NoKeywords
###############################################################

It does this when I import Bio.PDB. Not when I import Bio only.


On Mon, Oct 03, 2005 at 11:58:20AM -0400, Michiel De Hoon wrote:
>   File
>
"/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/__init__.py",
> line 22, in ?
>     from Polypeptide import PPBuilder, CaPPBuilder, is_aa,
standard_aa_names
>   File
>
"/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/PDB/Polypeptide.py
> ",
> line 5, in ?
>     from Bio.Seq import Seq
>   File "/users2/invites/wong/apps/lib/python2.3/site-packages/Bio/Seq.py",
> line1, in ?
>     import string, array
> ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so:
> undefined symbol: PyUnicodeUCS2_FromUnicode
> 
> 
> You are mixing python2.3 and python2.4 (see the paths in the traceout). You
> should reinstall Biopython with python2.4. From the traceback, it seems
that
> you have Biopython for python2.3 (or, for some reason, python imports
> python2.3's Biopython instead of python2.4's Biopython).
> 
> --Michiel.


From borreguero at gmail.com  Tue Oct  4 14:33:14 2005
From: borreguero at gmail.com (Jose Borreguero)
Date: Tue Oct  4 15:34:45 2005
Subject: [BioPython] how to add new methods to class Seq ?
Message-ID: <7cced4ed0510041133i686eb23u@mail.gmail.com>

Hi all,
I'm new to biopython (and python!), and wondering the efficient way to add
methods of my own to class Seq for my in-house computing. Should I (1) open
the code and start ading add methods; (2) instantiate a derived class from
Seq and write there my methods; (3) what else?
jose

From frederic.sohm at iaf.cnrs-gif.fr  Wed Oct  5 02:40:03 2005
From: frederic.sohm at iaf.cnrs-gif.fr (Frederic Sohm)
Date: Wed Oct  5 03:40:12 2005
Subject: [BioPython] how to add new methods to class Seq ?
In-Reply-To: <7cced4ed0510041133i686eb23u@mail.gmail.com>
References: <7cced4ed0510041133i686eb23u@mail.gmail.com>
Message-ID: <200510050840.03562.frederic.sohm@iaf.cnrs-gif.fr>

Hi

The best practice would commend to write a new class inheriting from Seq.
Here is an example :

>>> from Bio.Seq import Seq
>>> class myseq(Seq) :
	def tolist(self) :
		return list(self.tostring())

	
>>> s = myseq('ACGT')
>>> s.tolist()
['A', 'C', 'G', 'T']
>>> dir(Seq)
['_Seq__maketrans', '__add__', '__doc__', '__getitem__', '__getslice__', 
'__init__', '__len__', '__module__', '__radd__', '__repr__', '__str__', 
'complement', 'count', 'reverse_complement', 'tomutable', 'tostring']
>>> dir(myseq)
['_Seq__maketrans', '__add__', '__doc__', '__getitem__', '__getslice__', 
'__init__', '__len__', '__module__', '__radd__', '__repr__', '__str__', 
'complement', 'count', 'reverse_complement', 'tolist', 'tomutable', 
'tostring']
>>> 


Le Mardi 4 Octobre 2005 20:33, Jose Borreguero a ?crit?:
> Hi all,
> I'm new to biopython (and python!), and wondering the efficient way to add
> methods of my own to class Seq for my in-house computing. Should I (1) open
> the code and start ading add methods; (2) instantiate a derived class from
> Seq and write there my methods; (3) what else?
> jose
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython

-- 
Fr?d?ric Sohm
Equipe INRA U1126 "Morphogen?se du syst?me nerveux des Chord?s"
UPR 2197 DEPSN, CNRS
Institut de Neurosciences A. Fessard
1 Avenue de la Terrasse
91 198 GIF-SUR-YVETTE
FRANCE
Phone: +33 (0) 1 69 82 34 12
Fax:+33 (0) 1 69 82 34 47

From wong at ebgm.jussieu.fr  Wed Oct  5 03:59:38 2005
From: wong at ebgm.jussieu.fr (WONG Hua)
Date: Wed Oct  5 03:59:10 2005
Subject: [BioPython] Having some issue with unicode and Bio.PDB
In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD1A@cgcmail.cgc.cpmc.columbia.edu>
References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD1A@cgcmail.cgc.cpmc.columbia.edu>
Message-ID: <20051005075938.GA4714@bach.ebgm.jussieu.fr>

Thanks for the suggestion. At least, I see where the problem really is now.

Michiel De Hoon wrote:
> To find out if this is a problem with Biopython, or a problem with Python
> itself, can you try:
> >>> import array
> without Biopython? My guess is that you will find the same error:
> 
> ImportError: /users2/invites/wong/apps/lib/python2.4/lib-dynload/array.so:
> undefined symbol: _PyArg_NoKeywords
> 
> --Michiel.
> 
> Michiel de Hoon
> Center for Computational Biology and Bioinformatics
> Columbia University
> 1150 St Nicholas Avenue
> New York, NY 10032
> 
> 
> 


From 2huggie at gmail.com  Wed Oct  5 04:55:52 2005
From: 2huggie at gmail.com (Timothy Wu)
Date: Wed Oct  5 05:22:48 2005
Subject: [BioPython] TMHMM
Message-ID: <ebf8d36c0510050155y597e4e9cgb8f3c43e159c56c9@mail.gmail.com>

Hi,

Two things;

1. Is there a TMHMM module (and hopfully documented) in BioPython?

2. I had posted a similar question to the Python mailing list a long time
ago, but unfortunately I had not get a satisfiable answer. I hope I can find
some help here.

>From viewing TMHMM html source (http://www.cbs.dtu.dk/services/TMHMM/) I
think the form fields are:

seqfile --> file, which I don't know how to use, so I will not use it.
SEQ --> text box, which should be aa sequences
outform --> radio buttons, valid values are '-noshort', '-noplot', '-short'.
I would like to have it as '-short'
version --> a check box, valid value is '-v1'

I tested using urllib with something like this:

-----------------------------------------
params = urllib.urlencode({
'configfile':'/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf',
'SEQ':'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV',
'version': '-short'
})

f = urllib.urlopen("http://www.cbs.dtu.dk/cgi-bin/nph-webface", params)
sec = f.read()
-----------------------------------------

Of course, a successful TMHMM query requires me to read the return page from
urlopen, and parse another url
from within to obtain the final result page. But this is only the code to
obtain the intermediate page.

My problem is, notice my above strange code. For the 'version' field, I need
to fill in either '-v1' or leave it empty.
For outform (which I didn't include), I should have '-short'. But that
doesn't work! Had I done that it would return "Read: Field not declared;
'outform'".

I can get it to work if I only fill in "configfile" and "SEQ" fields. But I
would get the equivalent of having 'noshort' value in 'outform' (extensive
with graphics). The only way I am going to get the '-short' effect (one line
per protein) is to fill in '-short' in the version field. This is very
bizarre.

Is there something I am missing? SignalP which also is provided by the same
script also show this bizarre nature. I have also tried a perl script to do
the same thing. Strange behavior also result but not like the Python one.

my $agent = new LWP::UserAgent;
my $request = POST($GS_URL,
Content_Type => 'form-data',
Content => [
configfile => '/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf',
SEQ => 'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV',
outform => '-short',
]
);

From 2huggie at gmail.com  Wed Oct  5 05:01:36 2005
From: 2huggie at gmail.com (Timothy Wu)
Date: Wed Oct  5 16:31:28 2005
Subject: [BioPython] TMHMM
Message-ID: <ebf8d36c0510050201t1d2257d9sabd31d661c9c88c7@mail.gmail.com>

(sorry for the previous post, I accidently sent it before I finish it)

Hi,

Two things;

1. Is there a TMHMM module (and hopfully documented) in BioPython?

2. I had posted a similar question to the Python mailing list a long time
ago, but unfortunately I had not get a satisfiable answer. I hope I can find
some help here.

>From viewing TMHMM html source (http://www.cbs.dtu.dk/services/TMHMM/) I
think the form fields are:

seqfile --> file, which I don't know how to use, so I will not use it.
SEQ --> text box, which should be aa sequences
outform --> radio buttons, valid values are '-noshort', '-noplot', '-short'.
I would like to have it as '-short'
version --> a check box, valid value is '-v1'

I tested using urllib with something like this:

-----------------------------------------
params = urllib.urlencode({
'configfile':'/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf',
'SEQ':'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV',
'version': '-short'
})

f = urllib.urlopen("http://www.cbs.dtu.dk/cgi-bin/nph-webface", params)
sec = f.read()
-----------------------------------------

Of course, a successful TMHMM query requires me to read the return page from
urlopen, and parse another url
from within to obtain the final result page. But this is only the code to
obtain the intermediate page.

My problem is, notice my above strange code. For the 'version' field, I need
to fill in either '-v1' or leave it empty.
For outform (which I didn't include), I should have '-short'. But that
doesn't work! Had I done that it would return "Read: Field not declared;
'outform'".

I can get it to work if I only fill in "configfile" and "SEQ" fields. But I
would get the equivalent of having 'noshort' value in 'outform' (extensive
with graphics). The only way I am going to get the '-short' effect (one line
per protein) is to fill in '-short' in the version field. This is very
bizarre.

Is there something I am missing? SignalP which also is provided by the same
script also show this bizarre nature. I have also tried a perl script to do
the same thing. Strange behavior also result but not like the Python one:

-----------------------------------------
my $agent = new LWP::UserAgent;
my $request = POST($GS_URL,
Content_Type => 'form-data',
Content => [
configfile => '/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf',
 SEQ => 'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV',
outform => '-short',
]
);
-----------------------------------------

The script works. But if I swap the lines between configfile and seq
parameters, it doesn't. I am not really familier with Perl so I have no idea
why this is the way it is.
Anyone can tell me why I had to fill in '-noshort' in the supposingly wrong
field 'version'? Thanks a bunch.

Timothy

From 2huggie at gmail.com  Sat Oct  8 08:21:28 2005
From: 2huggie at gmail.com (Timothy Wu)
Date: Sat Oct  8 08:28:42 2005
Subject: [BioPython] Re: TMHMM
In-Reply-To: <ebf8d36c0510050201t1d2257d9sabd31d661c9c88c7@mail.gmail.com>
References: <ebf8d36c0510050201t1d2257d9sabd31d661c9c88c7@mail.gmail.com>
Message-ID: <ebf8d36c0510080521k1f73b148x6754d995a44226c7@mail.gmail.com>

On 10/5/05, Timothy Wu <2huggie@gmail.com> wrote:
>
> (sorry for the previous post, I accidently sent it before I finish it)
>
> Hi,
>
> Two things;
>
> 1. Is there a TMHMM module (and hopfully documented) in BioPython?
>
2. I had posted a similar question to the Python mailing list a long time
> ago, but unfortunately I had not get a satisfiable answer. I hope I can find
> some help here.
>
> From viewing TMHMM html source (http://www.cbs.dtu.dk/services/TMHMM/) I
> think the form fields are:
>
> seqfile --> file, which I don't know how to use, so I will not use it.
> SEQ --> text box, which should be aa sequences
> outform --> radio buttons, valid values are '-noshort', '-noplot',
> '-short'. I would like to have it as '-short'
> version --> a check box, valid value is '-v1'
>
> I tested using urllib with something like this:
>
> -----------------------------------------
> params = urllib.urlencode({
> 'configfile':'/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf',
> 'SEQ':'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV',
> 'version': '-short'
> })
>
> f = urllib.urlopen("http://www.cbs.dtu.dk/cgi-bin/nph-webface", params)
> sec = f.read()
> -----------------------------------------
>
> Of course, a successful TMHMM query requires me to read the return page
> from urlopen, and parse another url
> from within to obtain the final result page. But this is only the code to
> obtain the intermediate page.
>
> My problem is, notice my above strange code. For the 'version' field, I
> need to fill in either '-v1' or leave it empty.
> For outform (which I didn't include), I should have '-short'. But that
> doesn't work! Had I done that it would return "Read: Field not declared;
> 'outform'".
>
> I can get it to work if I only fill in "configfile" and "SEQ" fields. But
> I would get the equivalent of having 'noshort' value in 'outform' (extensive
> with graphics). The only way I am going to get the '-short' effect (one line
> per protein) is to fill in '-short' in the version field. This is very
> bizarre.
>
> Is there something I am missing? SignalP which also is provided by the
> same script also show this bizarre nature. I have also tried a perl script
> to do the same thing. Strange behavior also result but not like the Python
> one:
>
> -----------------------------------------
> my $agent = new LWP::UserAgent;
> my $request = POST($GS_URL,
> Content_Type => 'form-data',
> Content => [
> configfile => '/usr/opt/www/pub/CBS/services/TMHMM-2.0/TMHMM2.cf',
>  SEQ => 'VVDGLHQAETISSQGFKELFEGYGNFNNTRNGVEVENLKQAVIQKGADAIRTGSGSLGGTV',
> outform => '-short',
> ]
> );
> -----------------------------------------
>
> The script works. But if I swap the lines between configfile and seq
> parameters, it doesn't. I am not really familier with Perl so I have no idea
> why this is the way it is.
> Anyone can tell me why I had to fill in '-noshort' in the supposingly
> wrong field 'version'? Thanks a bunch.
>
> Timothy
>


I mailed the Center for Biological Sequence Analysis (which host the TMHMM
web service) and the response I got was to put the configfile field as the
first key value pair. So instead of a dictionary I sent a list of tuples to
the urlencode() instead. This solved all problems. The version=-short works
because it was put on a commandline as a commandline switch. So it works
regardless.

Timothy

From omid9dr18 at hotmail.com  Wed Oct 12 16:47:27 2005
From: omid9dr18 at hotmail.com (Omid Khalouei)
Date: Wed Oct 12 17:03:43 2005
Subject: [BioPython] Structure Alignment
Message-ID: <BAY103-F37BF7697CF71982E748FEFE67B0@phx.gbl>

Hello,

Does biopython have any functions for structural alignment? I need to align 
some protein stuctures and measure the RMSD. I know how to do this with 
programs such as Swiss-PDBViewer but I need to do this on a dozen structures 
that's why I need to automate it.

Thanks for your help.

S. Khalouei


From mmokrejs at ribosome.natur.cuni.cz  Fri Oct 14 17:12:27 2005
From: mmokrejs at ribosome.natur.cuni.cz (=?ISO-8859-2?Q?Martin_MOKREJ=A9?=)
Date: Fri Oct 14 17:18:58 2005
Subject: [BioPython] genbank parser returns start position of the location
 decreased by one
Message-ID: <43501F3B.2000902@ribosome.natur.cuni.cz>

Hi,
  I am either too tired or have missed some point. I use bipython 1.40b to
fetch data from genbank. The
location: (467..2863) from Genbank as seen on their web pages differs to the string
returned by biopython. I get location: (466..2863) instead. The latter number is
never decreased, only the first-one. What's wrong? ;)
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=56117851
It happends with CDS feature data, but also with source, just anything:

FEATURES             Location/Qualifiers
     source          1..4115


$ python
Python 2.4.2 (#1, Oct  2 2005, 05:43:55) 
[GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import GenBank
>>> record_parser = GenBank.FeatureParser()
>>> ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = record_parser)
>>> gb_seqrecord = ncbi_dict['56117851']
>>> print _feature.location
(0..4115)
>>> 
From mdehoon at c2b2.columbia.edu  Fri Oct 14 18:05:10 2005
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri Oct 14 18:12:40 2005
Subject: [BioPython] genbank parser returns start position of the location
	decreased by one
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD4B@cgcmail.cgc.cpmc.columbia.edu>

I would think that this is intentional. Python uses zero-based arrays,
Genbank starts counting at 1.
In other words, gb_seqrecord.seq[0:4115] will return the sequence that you're
interested in.

--Michiel.

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


-----Original Message-----
From: biopython-bounces@portal.open-bio.org on behalf of Martin MOKREJS
Sent: Fri 10/14/2005 5:12 PM
To: biopython@biopython.org
Subject: [BioPython] genbank parser returns start position of the location
decreased by one
 
Hi,
  I am either too tired or have missed some point. I use bipython 1.40b to
fetch data from genbank. The
location: (467..2863) from Genbank as seen on their web pages differs to the
string
returned by biopython. I get location: (466..2863) instead. The latter number
is
never decreased, only the first-one. What's wrong? ;)
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=56117851
It happends with CDS feature data, but also with source, just anything:

FEATURES             Location/Qualifiers
     source          1..4115


$ python
Python 2.4.2 (#1, Oct  2 2005, 05:43:55) 
[GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import GenBank
>>> record_parser = GenBank.FeatureParser()
>>> ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser =
record_parser)
>>> gb_seqrecord = ncbi_dict['56117851']
>>> print _feature.location
(0..4115)
>>> 
_______________________________________________
BioPython mailing list  -  BioPython@biopython.org
http://biopython.org/mailman/listinfo/biopython


From mmokrejs at ribosome.natur.cuni.cz  Mon Oct 17 08:24:23 2005
From: mmokrejs at ribosome.natur.cuni.cz (=?windows-1252?Q?Martin_MOKREJ=8A?=)
Date: Mon Oct 17 08:23:31 2005
Subject: [BioPython] genbank parser returns start position of the location
	decreased by one
In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD4B@cgcmail.cgc.cpmc.columbia.edu>
References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD4B@cgcmail.cgc.cpmc.columbia.edu>
Message-ID: <435397F7.5030104@ribosome.natur.cuni.cz>

Hi Michiel,
  I thought this might be the "feature" of this, but imagine you just parse
genbank data and display on the web web. I don't believe anybode would expect
that the data fetched through biopython differ from those visible on the NCBI
web. Is this "feature" at least consistent with bioperl's behavior?
M.

Michiel De Hoon wrote:
> I would think that this is intentional. Python uses zero-based arrays,
> Genbank starts counting at 1.
> In other words, gb_seqrecord.seq[0:4115] will return the sequence that you're
> interested in.
From biopyte at yahoo.de  Sat Oct 22 06:43:12 2005
From: biopyte at yahoo.de (Hans Meier)
Date: Sat Oct 22 06:49:06 2005
Subject: [BioPython] restrichtion map like "remap" 
Message-ID: <20051022104312.70952.qmail@web26310.mail.ukl.yahoo.com>

Dear friends,

there's a feature in Biopython I can't find 
or it is missing.

It's a function like "remap" from emboss
or "map" from gcg. So you input a sequence
in raw- or fasta-format and you get a map
of your sequence including (optionally)
restriction sites, translation in all six frames ...
You can also format the output by telling
how many bases per line, marginwitdth etc. etc.

Is there nothing similiar within Biopython?
Any other suggestions (except using Emboss :)


Thanks a lot, Harald


___________________________________________________________ 
Gesendet von Yahoo! Mail - Jetzt mit 1GB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de
From frederic.sohm at iaf.cnrs-gif.fr  Mon Oct 24 03:15:54 2005
From: frederic.sohm at iaf.cnrs-gif.fr (Frederic Sohm)
Date: Mon Oct 24 03:37:02 2005
Subject: [BioPython] restrichtion map like "remap"
In-Reply-To: <20051022104312.70952.qmail@web26310.mail.ukl.yahoo.com>
References: <20051022104312.70952.qmail@web26310.mail.ukl.yahoo.com>
Message-ID: <200510240915.55248.frederic.sohm@iaf.cnrs-gif.fr>

Hi,

I don't know for the emboss support but for Restriction map you can use the 
module Restriction :

Python 2.4.2 (#1, Sep 28 2005, 17:53:13) 
[GCC 3.4.3 (Mandrakelinux 10.2 3.4.3-7mdk)] on linux2
Type "copyright", "credits" or "license()" for more information.

    ****************************************************************
    Personal firewall software may warn about the connection IDLE
    makes to its subprocess using this computer's internal loopback
    interface.  This connection is not visible on any external
    interface and no data is sent to or received from the Internet.
    ****************************************************************
    
IDLE 1.1.2      
>>> from Bio.Restriction import Analysis, AllEnzymes, CommOnly
>>> from Bio.Seq import Seq
>>> pbr = Seq('TTCT --- cut pBR322 sequence ---')
>>> restmap = Analysis(AllEnzymes, pbr, False)
>>> # False means the sequence is circular
>>> # Nothing or True the sequence is linear.
>>> restmap.print_that(None, 'restriction map of pBR322 \n\n')

restriction map of pBR322

AccII      :  348, 704, 819, 948, 975, 980, 1041, 1107, 1236, 1246, 1391, 
1417,
               1539, 1636, 2006, 2075, 2180, 2521, 3102, 3432, 3925, 4257.
--- cut all the enzymes and their sites ---

   Enzymes which do not cut the sequence.

AatI      Acc65I    AcvI      AflII     AgeI      AhlI      ApaI      AsiAI     
--- cut the enzymes absent from the sequence ---

>>> # to use only commercially available enzymes do that :
>>> restmap = Analysis(CommOnly, pbr, False)
--- results cut ---


You can format the results as map by doing :

>>> restmap.print_as('map')
>>> restmap.print_that()
...


Hope that helps. unfortunately there is no support for translation frames with 
this module. For more details on how to use it see the manual provided in the 
doc, cookbook style : here :
http://www.biopython.org/docs/cookbook/Restriction.html

best regards

Fred

Le Samedi 22 Octobre 2005 12:43, Hans Meier a ?crit?:
> Dear friends,
>
> there's a feature in Biopython I can't find
> or it is missing.
>
> It's a function like "remap" from emboss
> or "map" from gcg. So you input a sequence
> in raw- or fasta-format and you get a map
> of your sequence including (optionally)
> restriction sites, translation in all six frames ...
> You can also format the output by telling
> how many bases per line, marginwitdth etc. etc.
>
> Is there nothing similiar within Biopython?
> Any other suggestions (except using Emboss :)
>
>
> Thanks a lot, Harald
>
>
>
>
>
>
>
>
>
>
>
>
>
> ___________________________________________________________
> Gesendet von Yahoo! Mail - Jetzt mit 1GB Speicher kostenlos - Hier
> anmelden: http://mail.yahoo.de
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython

-- 
Fr?d?ric Sohm
Equipe INRA U1126 "Morphogen?se du syst?me nerveux des Chord?s"
UPR 2197 DEPSN, CNRS
Institut de Neurosciences A. Fessard
1 Avenue de la Terrasse
91 198 GIF-SUR-YVETTE
FRANCE
Phone: +33 (0) 1 69 82 34 12
Fax:+33 (0) 1 69 82 34 47

From boris.steipe at utoronto.ca  Thu Oct 27 14:32:50 2005
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Thu Oct 27 14:37:03 2005
Subject: [BioPython] Fwd: Please take the Gene Ontology survey
References: <Pine.LNX.4.44.0510271819530.18029-100000@pigeon.ebi.ac.uk>
Message-ID: <AC0C64F3-986B-4265-95DF-22212FF2BACC@utoronto.ca>

My apologies in case this reaches anyone more than once.

Context: the GO grant is up for competitive renewal in the new year,  
and the volume of responses to this survey will help GO demonstrate  
the degree to
which it has been adopted by the community.

And, as you know, government support for computational biology  
infrastructure has been insufficient in the recent past.


Boris


Begin forwarded message:

> From: Jane Lomax <jane@ebi.ac.uk>
> Date: 27 October 2005 13:20:14 GMT-04:00
> To: boris.steipe@utoronto.ca
> Subject: Please take the Gene Ontology survey
>
>
> Hello,
>
> The Gene Ontology (GO) is a system for functional annotation of  
> genes and
> gene products. It enables classification of gene products according to
> molecular function, biological process, and cellular location of
> action.
>
> Please help us by taking part in our survey.
>
> The results of this survey will help us improve our services
> to our user community, and help direct our resources more effectively.
>
> It's a very straightforward set of questions, which should take a  
> maximum
> of 10 minutes to complete. There's no requirement to submit your name
> or email address. To complete the survey, go to:
>
> http://www.AdvancedSurvey.com/default.asp?SurveyID=32355
>
> Please pass on to any friends or collegues not on these lists.
>
> Many thanks for your time,
>
> The GO Consortium
>
>
>
>
>
>
>
>
>
>
>

From gabraham at cs.rmit.edu.au  Fri Oct 28 10:52:50 2005
From: gabraham at cs.rmit.edu.au (Gad Abraham)
Date: Fri Oct 28 11:24:32 2005
Subject: [BioPython] Extracting residue list from PDB
Message-ID: <20051028145250.GA22523@cs.rmit.edu.au>

Hi,

I'm trying to extract a FASTA-like list of residues from a PDB file. It
doesn't seem to work correctly for some (e.g. 1n62, which comes out as
10 chains while it only has 6, and chain lengths are wrong too).

I'm using the following script based on the Structural Biopython FAQ:

#!/usr/bin/python

from Bio.PDB import *
import sys

parser = PDBParser()
structure = parser.get_structure(sys.argv[1], sys.argv[1])

ppb = PPBuilder()
for pp in ppb.build_peptides(structure):
   print len(pp),pp.get_sequence().tostring()
   print 


Any tips would be appreciated.

Thanks,
Gad
-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
gabraham@cs.rmit.edu.au
http://yallara.cs.rmit.edu.au/~gabraham
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From idoerg at burnham.org  Fri Oct 28 12:07:53 2005
From: idoerg at burnham.org (Iddo Friedberg)
Date: Fri Oct 28 12:16:06 2005
Subject: [BioPython] Extracting residue list from PDB
In-Reply-To: <20051028145250.GA22523@cs.rmit.edu.au>
References: <20051028145250.GA22523@cs.rmit.edu.au>
Message-ID: <43624CD9.5070009@burnham.org>

Hi Gad,

The reason the chains seem to be of the wrong length, is that they are 
generated from the ATOM records, rather than the SEQRES records. Those 
disagree often enough in PDB files.
This problem does not exists in mmCIF, I believe.

Using your code, I got only six chains, so I cannot comment on the 
second problem.

Best,

./I

Gad Abraham wrote:

>Hi,
>
>I'm trying to extract a FASTA-like list of residues from a PDB file. It
>doesn't seem to work correctly for some (e.g. 1n62, which comes out as
>10 chains while it only has 6, and chain lengths are wrong too).
>
>I'm using the following script based on the Structural Biopython FAQ:
>
>#!/usr/bin/python
>
>from Bio.PDB import *
>import sys
>
>parser = PDBParser()
>structure = parser.get_structure(sys.argv[1], sys.argv[1])
>
>ppb = PPBuilder()
>for pp in ppb.build_peptides(structure):
>   print len(pp),pp.get_sequence().tostring()
>   print 
>
>
>Any tips would be appreciated.
>
>Thanks,
>Gad
>  
>


-- 
Iddo Friedberg, Ph.D.
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9949
http://ffas.ljcrf.edu/~iddo

From mdehoon at c2b2.columbia.edu  Fri Oct 28 20:14:54 2005
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri Oct 28 20:20:37 2005
Subject: [BioPython] Biopython release 1.41
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD8B@cgcmail.cgc.cpmc.columbia.edu>

Dear biopythoneers,

We are pleased to announce the release of Biopython 1.41. Many improvements
were made in Biopython during the eight months since the previous release,
and the new release contains lots of bugfixes, improvements, new
functionalities, and better documentation. To pick a few, there's the new
Bio.MEME module by Jason Hackney, updates to the Blast parser using Bertrand
Frottier's NCBIXML code, a BLAT parser by Yair Benita, numerous updates in
Bio.PDB, CompareACE support in AlignAce, and improved user-friendliness in
Bio.Seq.

Lots of people of contributed to this release, in particular Frank Kauff
(Bio.Nexus), Jason Hackney (Bio.MEME), Thomas Hamelryck (Bio.PDB), Fr?d?ric
Sohm (Bio.Restriction), James Casbon (Bio.SCOP) for bug fixes and updates,
Peter (Bio.Blast.NCBIXML test cases), and of course Jeff Chang, Brad Chapman,
Andrew Dalke, and Iddo Friedman for Biopython and the fool-proof instructions
on how to roll a release, which made this a lot easier than I anticipated. My
apologies if I forgot to thank somebody.


--Michiel


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From gabraham at cs.rmit.edu.au  Fri Oct 28 22:14:27 2005
From: gabraham at cs.rmit.edu.au (Gad Abraham)
Date: Fri Oct 28 22:16:11 2005
Subject: [BioPython] Extracting residue list from PDB
In-Reply-To: <43624CD9.5070009@burnham.org>
References: <20051028145250.GA22523@cs.rmit.edu.au>
	<43624CD9.5070009@burnham.org>
Message-ID: <20051029021427.GA17453@cs.rmit.edu.au>

On Fri, Oct 28, 2005 at 09:07:53AM -0700, Iddo Friedberg wrote:
> Hi Gad,
> 
> The reason the chains seem to be of the wrong length, is that they are 
> generated from the ATOM records, rather than the SEQRES records. Those 
> disagree often enough in PDB files.
> This problem does not exists in mmCIF, I believe.
> 
> Using your code, I got only six chains, so I cannot comment on the 
> second problem.
> 

I see. I'm consistently getting 10 chains (lengths 161, 804, 97, 176,
11, 71, 87, 662, 133, 286) for 1N62.

It seems that parsing the SEQRES is the way go.

Thanks,
Gad

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
gabraham@cs.rmit.edu.au
http://yallara.cs.rmit.edu.au/~gabraham
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From boris.steipe at utoronto.ca  Sat Oct 29 00:45:38 2005
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Sat Oct 29 03:32:47 2005
Subject: [BioPython] Extracting residue list from PDB
In-Reply-To: <20051029021427.GA17453@cs.rmit.edu.au>
References: <20051028145250.GA22523@cs.rmit.edu.au>
	<43624CD9.5070009@burnham.org>
	<20051029021427.GA17453@cs.rmit.edu.au>
Message-ID: <C0BF3903-1F25-4690-9421-119862019ECC@utoronto.ca>

SEQRES and ATOM records have different semantics: the SEQRES is what  
the crystallographer puts into the experiment, the ATOM records is  
what she sees. Presumably you have covalent chain-breaks between  
residues, or parts of the polypeptide chain were not traceable in  
electron density and were omitted.

So even though they numbers are inconsistent, they are both right.

A related issue may be what the natural protein sequence is, as  
opposed to the perhaps truncated molecule in the crystal (presumably  
you are not interested in the propensity of fragments to crystallize,  
but in some biological property), or what the translated sequence is,  
that may have been posttranslationally processed, or even what the  
gene sequence is, that may have been translated with e.g.  
selenocysteine, etc. etc.

So, (as usual) "the way to go" is determined by where you want to go to.

HTH

Boris


On 28 Oct 2005, at 22:14, Gad Abraham wrote:


> On Fri, Oct 28, 2005 at 09:07:53AM -0700, Iddo Friedberg wrote:
>
>
>
>> Hi Gad,
>>
>> The reason the chains seem to be of the wrong length, is that they  
>> are
>> generated from the ATOM records, rather than the SEQRES records.  
>> Those
>> disagree often enough in PDB files.
>> This problem does not exists in mmCIF, I believe.
>>
>> Using your code, I got only six chains, so I cannot comment on the
>> second problem.
>>
>>
>>
>>
>
> I see. I'm consistently getting 10 chains (lengths 161, 804, 97, 176,
> 11, 71, 87, 662, 133, 286) for 1N62.
>
> It seems that parsing the SEQRES is the way go.
>
> Thanks,
> Gad
>
> -- 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> gabraham@cs.rmit.edu.au
> http://yallara.cs.rmit.edu.au/~gabraham
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
>
>


From gabraham at cs.rmit.edu.au  Mon Oct 31 04:05:30 2005
From: gabraham at cs.rmit.edu.au (Gad Abraham)
Date: Tue Nov  1 16:58:48 2005
Subject: [BioPython] Extracting residue list from PDB
In-Reply-To: <C0BF3903-1F25-4690-9421-119862019ECC@utoronto.ca>
References: <20051028145250.GA22523@cs.rmit.edu.au>
	<43624CD9.5070009@burnham.org>
	<20051029021427.GA17453@cs.rmit.edu.au>
	<C0BF3903-1F25-4690-9421-119862019ECC@utoronto.ca>
Message-ID: <20051031090530.GB666@cs.rmit.edu.au>

On Sat, Oct 29, 2005 at 12:45:38AM -0400, Boris Steipe wrote:
> SEQRES and ATOM records have different semantics: the SEQRES is what  
> the crystallographer puts into the experiment, the ATOM records is  
> what she sees. Presumably you have covalent chain-breaks between  
> residues, or parts of the polypeptide chain were not traceable in  
> electron density and were omitted.
> 
> So even though they numbers are inconsistent, they are both right.
> 
> A related issue may be what the natural protein sequence is, as  
> opposed to the perhaps truncated molecule in the crystal (presumably  
> you are not interested in the propensity of fragments to crystallize,  
> but in some biological property), or what the translated sequence is,  
> that may have been posttranslationally processed, or even what the  
> gene sequence is, that may have been translated with e.g.  
> selenocysteine, etc. etc.
> 
> So, (as usual) "the way to go" is determined by where you want to go to.

Like everyone else, I'm trying to predict structure from sequence. So
I'm interested in the true sequence of the chain, and what the
corresponding tertiary structure is. So it seems to me that the SEQRES
entries are the more correct sequence to use, because the ATOM entries
are a function of the latter (convoluted through experiments), not the
other way round.

Thanks,
Gad

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
gabraham@cs.rmit.edu.au
http://yallara.cs.rmit.edu.au/~gabraham
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
From gabraham at cs.rmit.edu.au  Mon Oct 31 04:00:52 2005
From: gabraham at cs.rmit.edu.au (Gad Abraham)
Date: Tue Nov  1 16:58:57 2005
Subject: [BioPython] Extracting residue list from PDB
In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD92@cgcmail.cgc.cpmc.columbia.edu>
References: <6CA15ADD82E5724F88CB53D50E61C9AE9ECD92@cgcmail.cgc.cpmc.columbia.edu>
Message-ID: <20051031090052.GA666@cs.rmit.edu.au>

On Sun, Oct 30, 2005 at 03:44:17PM -0500, Michiel De Hoon wrote:
> > On Fri, Oct 28, 2005 at 09:07:53AM -0700, Iddo Friedberg wrote:
> > > Using your code, I got only six chains, so I cannot comment on the 
> > > second problem.
> > > 
> 
> > I see. I'm consistently getting 10 chains (lengths 161, 804, 97, 176,
> > 11, 71, 87, 662, 133, 286) for 1N62.
> 
> Maybe you are using an older version of Biopython?

I'm using biopython 1.30, which is the latest in Ubuntu Breezy Linux.

Gad

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
gabraham@cs.rmit.edu.au
http://yallara.cs.rmit.edu.au/~gabraham
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From thamelry at binf.ku.dk  Sat Oct 29 17:26:32 2005
From: thamelry at binf.ku.dk (thamelry@binf.ku.dk)
Date: Tue Nov  1 17:53:10 2005
Subject: [BioPython] Extracting residue list from PDB
In-Reply-To: <C0BF3903-1F25-4690-9421-119862019ECC@utoronto.ca>
References: <20051028145250.GA22523@cs.rmit.edu.au>
	<43624CD9.5070009@burnham.org> <20051029021427.GA17453@cs.rmit.edu.au>
	<C0BF3903-1F25-4690-9421-119862019ECC@utoronto.ca>
Message-ID: <3019.193.110.248.8.1130621192.squirrel@www.binf.ku.dk>

> SEQRES and ATOM records have different semantics: the SEQRES is what
> the crystallographer puts into the experiment, the ATOM records is
> what she sees. Presumably you have covalent chain-breaks between
> residues, or parts of the polypeptide chain were not traceable in
> electron density and were omitted.
>
> So even though they numbers are inconsistent, they are both right.

That's correct. The discussed code locates all the connected peptide
fragments in the structure and returns their sequences. Connectivity
is evaluated by looking for proper peptide bonds between consecutive
residues.

If you want to get the sequence of the whole protein you can
take a look at the SEQRES record or use some kind of database
info.

Cheers,

-Thomas