From dag at sonsorol.org  Fri Sep  2 10:00:09 2005
From: dag at sonsorol.org (Chris Dagdigian)
Date: Fri Sep  2 09:49:33 2005
Subject: [BioPython] Fwd: An interface to the maximum-likelihood programs of
	PHYLIP
References: <43184DEC.60600@student.cs.york.ac.uk>
Message-ID: <FD7FEA8D-1742-44E2-802A-9CCDF0B5BAEA@sonsorol.org>


Begin forwarded message:

> From: rjw500 <rjw500@cs.york.ac.uk>
> Date: September 2, 2005 9:04:44 AM EDT
> To: biopython-dev-owner@biopython.org
> Subject: An interface to the maximum-likelihood programs of PHYLIP
>
>
> Dear Biopython-dev-owner,
>
> I am sorry to trouble you, but I am not sure who to contact, since  
> my inital e-mail to biopython-dev@biopython.org bounced back.
>
> I am studying for an MSc in Information Processing at York  
> University in the UK. As part of the course I am carrying out a  
> short research project with Dr. James Cussens. I chose to develop  
> an interface in Python to the maximum-likelihood programs of the  
> phylogenetic analysis package PHYLIP. The code is based on the  
> modules available in Biopython and I was wondering if you would be  
> interested in incorporating it into the next release of Biopython.
>
> I have written the following classes and modules:
>
>
> - a class to represent a PHYLIP multiple sequence alignment based  
> on Bio.Align.Generic.Alignment
>
> - classes to represent the alphabets required for the sequence data  
> in PHYLIP input files based on Bio.Alphabet
>
> - a light class to allow the conversion of multiple sequence  
> alignment objects in other formats, such as Clustalw, that is  
> derived form Bio.Align.FormatConvert
>
> - a module to parse PHYLIP input files based on Bio.ParserSupport.  
> This module includes scanners to read the two PHYLIP input file  
> formats, a consumer and several parsers built upon these classes.
>
> - modules for the PHYLIP maximum-likelihood programs dnaml, dnamlk,  
> proml, promlk, as well as the PHYLIP  programs seqboot, consense  
> and treedist that are based upon Bio.Application
>
>
> These classes and modules allow the automation of phylogenetic  
> analysis, which particularly in the case of the maximum likelihood  
> methods can be a very time consuming process. A short script can be  
> written to analyse a multiple sequence alignment with one of the  
> maximum-likelihood programs, and then examine the tree produced by  
> bootstrapping to test if the relationships identified are supported  
> by the data.
>
> I realise Biopython currently provides support for the distance- 
> matrix programs of PHYLIP through the EMBOSS package wrappers.  
> However, wrappers for the PHYLIP maximum likelihood programs in the  
> EMBOSS package are either incomplete (dnaml and dnamlk), lacking  
> the facility to use multiple data sets which is critical for  
> bootstrapping, or completely absent (proml and promlk). Thus, I  
> decided to extend the existing support for PHYLIP by writing an  
> interface to the maximum likelihood programs of the standard PHYLIP  
> package. I wrote the modules for seqboot and consense, which are  
> available via the EMBOSS wrappers, so that people who had not  
> installed the EMBOSS package would also be able to carry out  
> bootstrap analysis.
>
> I look forward to hearing from you,
>
> Best wishes,
>
> Robert Wilson
>

From dalke at dalkescientific.com  Sun Sep  4 19:50:03 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sun Sep  4 19:52:27 2005
Subject: [BioPython] HMM module
Message-ID: <d91ce21db34ec8093e1a078731526655@dalkescientific.com>

Hi all,

   I'm teaching a course here at the NBN.  It's a successor to
the course I taught 1.5 years ago.  My lecture notes are at
   http://www.dalkescientific.com/writings/NBN/

The class is learning probabilistic modeling this week.  I
want to cover Markov models.  Should I use Bio.BNN or Bio.MarkovModel
and is there any documentation?  Or is some 3rd party package better?

					Andrew
					dalke@dalkescientific.com

From boehme at mpiib-berlin.mpg.de  Mon Sep  5 05:10:39 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Mon Sep  5 05:07:48 2005
Subject: [BioPython] Changes in NCBI BLAST output format
In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE7AC20A@cgcmail.cgc.cpmc.columbia.edu>
References: <6CA15ADD82E5724F88CB53D50E61C9AE7AC20A@cgcmail.cgc.cpmc.columbia.edu>
Message-ID: <431C0B8F.9010807@mpiib-berlin.mpg.de>

Hello Michiel,

I got your fix for NCBIWWW.py from the CVS, but now I get a diffrent 
error message:   SyntaxError: Line does not contain 'Database':
.
.
.
File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 47, in parse
   self._scanner.feed(handle, self._consumer)
File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 100, in feed
   self._scan_header(uhandle, consumer)
File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 172, in 
_scan_header
   self._scan_database_info(uhandle, consumer)
File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 190, in 
_scan_database_info
   read_and_call(uhandle, consumer.database_info, contains='Database')
File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 301, in 
read_and_call
   raise SyntaxError, errmsg

This is the sequence I blasted: GGAAGATAATGAACACCAAGA
result_handle = NCBIWWW.qblast('blastn', 'nr', f_record, expect = 100, 
word_size = 7,filter = 'L',entrez_query = 'Homo sapiens 
[ORGN]',descriptions=10,alignments=50)

(I added word_size to qblast, but that doesn't make any difference)

I tried to workout what the problem was, but I don't find it that easy 
to see, what it is the blast parser is exactly doing. I'm wondering why 
nobody else seems to get this error? Thanks for any help!

Martina


From boehme at mpiib-berlin.mpg.de  Tue Sep  6 11:26:27 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Tue Sep  6 11:17:53 2005
Subject: [BioPython] Changes in NCBI BLAST output format
In-Reply-To: <200509052228.j85MSZ7Y022211@itsa.ucsf.edu>
References: <200509052228.j85MSZ7Y022211@itsa.ucsf.edu>
Message-ID: <431DB523.9020701@mpiib-berlin.mpg.de>

Hello,

shouldn't the suggestion by Alexander in the _Scanner class:
"
     change:
         attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
     to:
         attempt_read_and_call(uhandle, consumer.noevent)
"
be changed to something like:
        read_and_call_while(uhandle, consumer.noevent, blank=1)?

At least, that is what is working in my case.
This is because of the 2 different types of return files: one with the 
queries first, the other with the database info first. If it is database 
first, then there is an additional blank line which gave me problems.

Martina


meames@itsa.ucsf.edu wrote:
> Yes, I have also observed a similar problem - 
> 
> The new BLAST output has an extra empty line between the "RID:" line
> and the "<b>Database:" line, which chokes the parser. A temporary (albeit 
> bad programming) solution would be to eliminate the line in the saved
> BLAST output file before passing it on the parser.
> 
> Matt 
> 
> On 5 Sep 2005 11:10:39 +0200 "Martina" wrote:
> 
> 
>>Hello Michiel,
>>
>>I got your fix for NCBIWWW.py from the CVS, but now I get a diffrent 
>>error message:   SyntaxError: Line does not contain 'Database':
>>.
.
.
From djkojeti at unity.ncsu.edu  Tue Sep  6 11:16:41 2005
From: djkojeti at unity.ncsu.edu (Douglas Kojetin)
Date: Tue Sep  6 11:21:39 2005
Subject: [BioPython] BioPDB: get_resname() example?
Message-ID: <E4B69A7B-F316-48C9-B67F-A5171DE73BD6@unity.ncsu.edu>

Hi All-

I've read in a PDB structure (see below), and would like to get the  
residue name for specific sequence positions (e.g. residue number  
5).  Can someone suggest to me how to do this?  I cannot figure it  
out using the Structural Biopython FAQ.

# ---> start

from Bio.PDB import *

pdb='file.pdb'
p=PDBParser();
s=p.get_structure('THE_STRUCTURE', pdb)

# i figured i can get the residue list using this
res_list = Selection.unfold_entities(s, 'R')

# but i'm not sure what to do next, or if there is a better way to  
get the information
# what i would like to do is a get_resname(5), to get the residue  
type of residue 5 (e.g. ASP)

# ---> end


Thanks,
Doug
From thamelry at binf.ku.dk  Tue Sep  6 11:23:44 2005
From: thamelry at binf.ku.dk (Thomas Hamelryck)
Date: Tue Sep  6 13:03:03 2005
Subject: [BioPython] BioPDB: get_resname() example?
In-Reply-To: <E4B69A7B-F316-48C9-B67F-A5171DE73BD6@unity.ncsu.edu>
References: <E4B69A7B-F316-48C9-B67F-A5171DE73BD6@unity.ncsu.edu>
Message-ID: <200509061723.44414.thamelry@binf.ku.dk>

On Tuesday 06 September 2005 17:16, Douglas Kojetin wrote:
> Hi All-
>
> I've read in a PDB structure (see below), and would like to get the
> residue name for specific sequence positions (e.g. residue number
> 5).  Can someone suggest to me how to do this?  I cannot figure it
> out using the Structural Biopython FAQ.
>
> # ---> start
>
> from Bio.PDB import *
>
> pdb='file.pdb'
> p=PDBParser();
> s=p.get_structure('THE_STRUCTURE', pdb)

Suppose you have one model (i.e. with model id=0), and a chain that is called
'A', then you do something like:

r=s[0]['A'][5]
print r.get_resname()

Cheers,

-Thomas

From biopython at maubp.freeserve.co.uk  Fri Sep  2 10:21:55 2005
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed Sep  7 05:46:47 2005
Subject: [BioPython] Fwd: An interface to the maximum-likelihood programs
	of	PHYLIP
In-Reply-To: <FD7FEA8D-1742-44E2-802A-9CCDF0B5BAEA@sonsorol.org>
References: <43184DEC.60600@student.cs.york.ac.uk>
	<FD7FEA8D-1742-44E2-802A-9CCDF0B5BAEA@sonsorol.org>
Message-ID: <43186003.90006@maubp.freeserve.co.uk>

Robert Wilson <rjw500@cs.york.ac.uk> wrote:
 > I am studying for an MSc in Information Processing at York  University
 > in the UK. As part of the course I am carrying out a  short research
 > project with Dr. James Cussens. I chose to develop  an interface in
 > Python to the maximum-likelihood programs of the  phylogenetic
 > analysis package PHYLIP. The code is based on the  modules available
 > in Biopython and I was wondering if you would be  interested in
 > incorporating it into the next release of Biopython.

I did have a look at scripting PHYLIP some time ago, but the lack of 
command line arguments made this less than straight forward.

Your work sounds very interesting, and should make a useful addition to 
BioPython.  We will have to see what the maintainers make of it.

Do you have a link to the code?

Good documentation/examples would really be a bonus.

Thanks

Peter
From idoerg at burnham.org  Thu Sep  8 01:41:05 2005
From: idoerg at burnham.org (Iddo Friedberg)
Date: Thu Sep  8 01:39:45 2005
Subject: [BioPython] python sidebar for mozilla/firefox
Message-ID: <Pine.SGI.4.10.10509072239230.5231484-100000@pines2.ljcrf.edu>


Slightly offtopic, but some of you folks might like it.

check this out:
http://projects.edgewall.com/python-sidebar/

Thanks to Darek for Kedra for bringing this to my attention.

./I


--
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037, USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171
http://ffas.ljcrf.edu/~iddo

From chris.lasher at gmail.com  Wed Sep  7 21:21:00 2005
From: chris.lasher at gmail.com (Chris Lasher)
Date: Thu Sep  8 03:07:36 2005
Subject: [BioPython] Fwd: An interface to the maximum-likelihood programs
	of PHYLIP
In-Reply-To: <43186003.90006@maubp.freeserve.co.uk>
References: <43184DEC.60600@student.cs.york.ac.uk>
	<FD7FEA8D-1742-44E2-802A-9CCDF0B5BAEA@sonsorol.org>
	<43186003.90006@maubp.freeserve.co.uk>
Message-ID: <128a885f0509071821699ed9f5@mail.gmail.com>

BioPython support for PHYLIP files would be most welcome to me! I use
the PHYLIP programs frequently, and sometimes would like to do
automated tasks with some of the infiles/outfiles. Automated
scripting, as Peter pointed out, is made difficult by PHYLIP's
menu-driven operation, but the program is open-source (under which
liscense, I don't know), so perhaps this could be changed to CLI after
some non-trivial work (and perhaps with Joe Felsenstein's permission).
I'd also like to see the downfall of the 10-character sequence name
limit in PHYLIP, but that's a gripe for another time...

Do keep us posted, Robert.

Chris

From syd.diamond at gmail.com  Sun Sep 11 03:57:11 2005
From: syd.diamond at gmail.com (Syd Diamond)
Date: Sun Sep 11 10:18:25 2005
Subject: [BioPython] cpairwise2module.c setup.py fun-ness
Message-ID: <a76ba31505091100576921f03b@mail.gmail.com>

Good Sunday Morning Biopythoners,

I'm not a hardcore bioinformatician per se, but in a project the need came 
along to find database matches for strings with typos. I turned to 
biopython, found the pairwise alignment function, and heuristically set the 
parameters to find good alphanumeric matches. This is not the point of the 
post, but I thought it was a cool application of biopython.

Anyways, the c alignment module is significantly (2-4X) faster than the 
native python alignment module, and I was hoping to include this c module 
(cpairwise2) in a very low-key GPL distribution. I put in the biopython 
license file, and now I am trying to figure out a way to pare down the 
setup.py file to allow the user to compile the module.

I'm in over my head here. My python hack0r sk1llz are intermediate, but my 
gcc skills leave much to be desired. I have no idea where to begin.

In the biopython setup.py, I know cpairwise2 is simply one of the 
extensions.

** How should I reduce setup.py to only compile the cpairwise2 module?? **

Many thanks, yall.

S

From mdehollander at gmail.com  Mon Sep 19 15:22:51 2005
From: mdehollander at gmail.com (Mattias de Hollander)
Date: Mon Sep 19 15:47:04 2005
Subject: [BioPython] problems indexing a FASTA file
Message-ID: <e3f8e9a005091912221a7433f9@mail.gmail.com>

I am trying to index a FASTA file with the following commands (found in the 
Cookbook):
>>>from Bio import Fasta
>>>dict_file = "sequences.fasta"
>>>index_file = "sequences.idx"
>>>Fasta.index_file(dict_file, index_file, rec2key=None)

But i get the following error:
Traceback (most recent call last):
File "fasta_database.py", line 27, in ?
main()
File "fasta_database.py", line 21, in main
Fasta.index_file(dict_file, index_file, rec2key=None)
File "/usr/lib/python2.3/site-packages/Bio/Fasta/__init__.py", line 243, in 
index_file SimpleSeqRecord.create_flatdb([filename], indexname, indexer)
File "/usr/lib/python2.3/site-packages/Bio/Mindy/SimpleSeqRecord.py", line 
111, in create_flatdb
creator = FlatDB.create(db_name, unique_name, alias_names)
File "/usr/lib/python2.3/site-packages/Bio/Mindy/FlatDB.py", line 297, in 
create
return open(dbname, "rw")
File "/usr/lib/python2.3/site-packages/Bio/Mindy/FlatDB.py", line 304, in 
open
return MemoryFlatDB(dbname)
File "/usr/lib/python2.3/site-packages/Bio/Mindy/FlatDB.py", line 130, in 
__init__
BaseFlatDB.__init__(self, dbname, INDEX_TYPE)
TypeError: __init__() takes exactly 2 arguments (3 given)

What am i doing wrong?

Thanks,

Mattias de Hollander

From no228 at cam.ac.uk  Tue Sep 20 07:24:11 2005
From: no228 at cam.ac.uk (Noel O'Boyle)
Date: Tue Sep 20 07:48:29 2005
Subject: [BioPython] SD/MDL file parser
Message-ID: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk>

Hello all,

I've just been through the documentation and site-packages on my
computer, and I cannot find a parser for SD (or MDL) files. This is the
most common file format for chemical structures in databases of
chemicals (as used by pharmaceutical companies, for example).

Did I miss this parser? I know that Andrew Dalke (through PyDaylight)
has an interest in chemistry, so I was expecting to find this parser...

Regards,
Noel O'Boyle.

From j.pansanel at pansanel.net  Tue Sep 20 08:17:59 2005
From: j.pansanel at pansanel.net (Jerome PANSANEL)
Date: Tue Sep 20 08:37:54 2005
Subject: [BioPython] SD/MDL file parser
In-Reply-To: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk>
References: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk>
Message-ID: <200509201418.00066.j.pansanel@pansanel.net>

Le Mardi 20 Septembre 2005 13:24, Noel O'Boyle a ?crit?:
> Hello all,

Hi !

> I've just been through the documentation and site-packages on my
> computer, and I cannot find a parser for SD (or MDL) files. This is the
> most common file format for chemical structures in databases of
> chemicals (as used by pharmaceutical companies, for example).
>
> Did I miss this parser? I know that Andrew Dalke (through PyDaylight)
> has an interest in chemistry, so I was expecting to find this parser...

You can use frowns (http://frowns.sourceforge.net/) in Python or if you need 
C++, I can send you one we have developped.

What do you need for features ?

Regards,

Jerome Pansanel

> Regards,
> Noel O'Boyle.
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython


From omid9dr18 at hotmail.com  Sat Sep 24 20:54:46 2005
From: omid9dr18 at hotmail.com (Omid Khalouei)
Date: Sat Sep 24 21:14:06 2005
Subject: [BioPython] Torsional angle
Message-ID: <BAY103-F168CC7769B3E92B8DDD007E6880@phx.gbl>

Hello,

Could someone please help me with measuring a torsional angle given the PDB 
coordinates of the 4 atoms involved in it.

Thanks alot.


From dalke at dalkescientific.com  Mon Sep 26 09:13:27 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon Sep 26 09:25:32 2005
Subject: [BioPython] SD/MDL file parser
In-Reply-To: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk>
References: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk>
Message-ID: <9d932c735619301486d2c02155404ff8@dalkescientific.com>

Hi Noel,

> I've just been through the documentation and site-packages on my
> computer, and I cannot find a parser for SD (or MDL) files. This is the
> most common file format for chemical structures in databases of
> chemicals (as used by pharmaceutical companies, for example).
>
> Did I miss this parser? I know that Andrew Dalke (through PyDaylight)
> has an interest in chemistry, so I was expecting to find this parser...

As Jerome mentioned, frowns includes an MDL parser.

 >>> from frowns import MDL
 >>> filename = 
"/usr/local/openeye/python/examples/oechem/examples/drugs.sdf"
 >>> for mol, error, text in MDL.sdin(open(filename)):
...   print mol.cansmiles(), mol.fields
...
C(c1c(OC(=O)C)cccc1)(=O)O {'Color': 'red', 'Energy': '1'}
c12C(=O)NC(=Nc1[n](cn2)COCCO)N {'Color': 'blue', 'Energy': '2'}
c1(c(cccc1)CC=C)OCC(O)CNC(C)C {'Color': 'green', 'Energy': '3'}
C1(C(N(c2ccccc2)N(C=1C)C)=O)N(C)C {'Energy': '4.5'}
c1(OCC(O)CNC(C)C)ccc(cc1)CC(=O)N {'Color': 'purple', 'Energy': '-3.5'}
C1(c2c(N(C(N1C)=O)C)nc[n]2C)=O {'Color': 'black', 'Energy': '0'}
 >>>

It converts the connection table data into a Frowns data
structure.  It should keep the chemistry the same as what's
in the file because the frowns.Molecule doesn't do any
perception, but doing something like cansmiles() will likely
change things.

If you have SD fields with repeats of the same key then
there will be a problem, because the parser expects that
the data can be stored in a dictionary.  OEChem has a
dictionary-like data structure which also allows list-like
iteration for this case.

If I had the time (okay, and if someone was willing to pay
for me to do this :) I would probably use something like my
MultiDict class instead.

Your email says you're in Cambridge, eh?  I'll be there
in a couple of weeks for the EuroMUG conference, staying
there for a week to also visit EBI and Sanger.

					Andrew
					dalke@dalkescientific.com

From no228 at cam.ac.uk  Mon Sep 26 09:27:58 2005
From: no228 at cam.ac.uk (Noel O'Boyle)
Date: Mon Sep 26 09:28:19 2005
Subject: [BioPython] SD/MDL file parser
In-Reply-To: <9d932c735619301486d2c02155404ff8@dalkescientific.com>
References: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk>
	<9d932c735619301486d2c02155404ff8@dalkescientific.com>
Message-ID: <1127741278.16760.213.camel@sandwi.ch.cam.ac.uk>

> If you have SD fields with repeats of the same key then
> there will be a problem, because the parser expects that
> the data can be stored in a dictionary.  OEChem has a
> dictionary-like data structure which also allows list-like
> iteration for this case.

> If I had the time (okay, and if someone was willing to pay
> for me to do this :) I would probably use something like my
> MultiDict class instead.

The whole frowns system is a bit overkill for simple manipulations of SD
fields. I am planning to cannabalise the mdl parser and writer of frowns
to make it more straightforward for myself. Thanks for the heads up
regarding multiple fields, though.

> Your email says you're in Cambridge, eh?  I'll be there
> in a couple of weeks for the EuroMUG conference, staying
> there for a week to also visit EBI and Sanger.
I'll probably see you there so, as I'll be attending.

Is it possible/worthwhile to write a parser using Martel? Or would you
say that there are too many non-standard sd files out there for a single
parser to have general applicability?

Regards,
Noel
-- 
Dr. Noel M. O'Boyle,
Group of Dr. John Mitchell (http://www-mitchell.ch.cam.ac.uk),
Unilever Centre for Molecular Science Informatics,
Dept. of Chemistry,
University of Cambridge,
U.K.

From dalke at dalkescientific.com  Mon Sep 26 10:09:06 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon Sep 26 10:23:13 2005
Subject: [BioPython] Torsional angle
In-Reply-To: <BAY103-F168CC7769B3E92B8DDD007E6880@phx.gbl>
References: <BAY103-F168CC7769B3E92B8DDD007E6880@phx.gbl>
Message-ID: <a044756b8ec05bb0f80892f795eceb6c@dalkescientific.com>

Hi Omid,

> Could someone please help me with measuring a torsional angle given  
> the PDB coordinates of the 4 atoms involved in it.


See the thread including
   http://www.rcsb.org/pdb/lists/pdb-l/200409/001936.html
and more details in
    
http://www.math.fsu.edu/~quine/IntroMathBio_04/torsion_pdb/ 
torsion_pdb.pdf

There is Biopython code for computing a torsional angle.

 >>> from Bio.PDB import Vector
 >>> p1 = Vector.Vector( [0.0, 0.0, 1.0] )
 >>> p2 = Vector.Vector( [0.0, 0.0, 0.0] )
 >>> p3 = Vector.Vector( [0.0, 1.0, 0.0] )
 >>> p4 = Vector.Vector( [1.0, 1.0, 0.0] )
 >>> Vector.calc_dihedral(p1, p2, p3, p4)
1.5707963267948966
 >>>

However, I think that's only in CVS.  The code is

def calc_dihedral(v1, v2, v3, v4):
     """
     Calculate the dihedral angle between 4 vectors
     representing 4 connected points. The angle is in
     ]-pi, pi].

     @param v1, v2, v3, v4: the four points that define the dihedral  
angle
     @type v1, v2, v3, v4: L{Vector}
     """
     ab=v1-v2
     cb=v3-v2
     db=v4-v3
     u=ab**cb
     v=db**cb
     w=u**v
     angle=u.angle(v)
     # Determine sign of angle
     try:
         if cb.angle(w)>0.001:
             angle=-angle
     except ZeroDivisionError:
         # dihedral=pi
         pass
     return angle

This depends on a Bio.PDB-specific Vector class which
implements the "angle" method.

     def angle(self, other):
         "Return angle between two vectors"
         n1=self.norm()
         n2=other.norm()
         c=(self*other)/(n1*n2)
         # Take care of roundoff errors
         c=min(c,1)
         c=max(-1,c)
         return arccos(c)


					Andrew
					dalke@dalkescientific.com

From dalke at dalkescientific.com  Mon Sep 26 10:39:34 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon Sep 26 10:39:20 2005
Subject: [BioPython] SD/MDL file parser
In-Reply-To: <1127741278.16760.213.camel@sandwi.ch.cam.ac.uk>
References: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk>
	<9d932c735619301486d2c02155404ff8@dalkescientific.com>
	<1127741278.16760.213.camel@sandwi.ch.cam.ac.uk>
Message-ID: <518860cf0035743064d2cb8b8fed39c4@dalkescientific.com>

Hi Noel,

> The whole frowns system is a bit overkill for simple manipulations of 
> SD
> fields. I am planning to cannabalise the mdl parser and writer of 
> frowns
> to make it more straightforward for myself. Thanks for the heads up
> regarding multiple fields, though.

The question is, what's overkill and what's not?  If you only
want the SD data then you can leave the connection table as an
unprocessed blob of text.  In that case a parse is about 25 lines
of code.

> I'll probably see you there so, as I'll be attending.

I'll see you then in a couple of weeks.

> Is it possible/worthwhile to write a parser using Martel? Or would you
> say that there are too many non-standard sd files out there for a 
> single
> parser to have general applicability?

When I wrote Martel I tested against an SD file ... and an
RXN file, and a molfile.  Somewhere is a Martel expression for
those formats.  It's not on this laptop so I need to check my
older one.  That's home in Santa Fe and I can't log into that
machine now since I'm traveling.  If you can dig up an old
Martel distribution (pre-Biopython integration) you might
be able to get ahold of it.

I don't think the diversity of an SD file format is that
large.  After all, many programs have SD file readers and
I don't hear anywhere near as many problems as with supporting,
say, the PDB format.

Still, if you only need the SD data reader then it isn't too
hard to do yourself, and there are few ways to make subtle
mistakes.

					Andrew
					dalke@dalkescientific.com

From bill at barnard-engineering.com  Mon Sep 26 20:04:18 2005
From: bill at barnard-engineering.com (Bill Barnard)
Date: Mon Sep 26 20:29:17 2005
Subject: [BioPython] Patches to enable Doc building for source and rpm
	distributions
Message-ID: <1127779458.16589.60.camel@tioga.barnard-engineering.com>

I've got some time free to work with Biopython again. I wanted to be
able to easily create an rpm including the Doc directory. I found that
the Makefile in the Doc directory didn't work quite correctly, so I set
out to remedy that.

My set of changes are small, and you may not want to commit all of them.
In particular I added two lines to setup.py to run make in the Doc
directory. There may be some better way to accomplish that task, but I
don't know distutils very well so I'm not a good judge. Also these are
only "needed" to enable the addition of the doc files in the rpm
generated by setup.py bdist_rpm.

I found that generating the pdf & html doc files to be straightforward
except for two things:

1) the hevea.sty file needs to be included in the distribution to allow
the make to complete; that's simply added in the MANIFEST.in file.

2) Generating html from biopdb_faq.tex has a minor failure (resulting in
a missing image file referenced in the html) due to lack of bounding box
information in the .tex file. Since the .tex file is exported (I assume)
from the .lyx file, I thought the "problem" should be fixed there. I did
not figure it out, and didn't want to learn too much more about LyX in
order to do so. Instead I created a patch file for the .tex file that
permits proper html generation by hevea. The patch is applied in the
Makefile.

The Makefile is actually entirely new. The old Makefile didn't have much
in it, and was not very generic. The new one is a bit cleaner and more
generic.

I'm attaching the two patch files to this email. I'll see when it comes
back to me whether anything in my email chain has stripped the patch
files from the email.

I hope this is useful. If it is, then I may visit the subdirectories of
Doc to structure their Makefiles similarly, and set them up to be called
recursively.

Cheers,

Bill
-- 
Bill Barnard <bill@barnard-engineering.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biopython-Doc_Makefile_fix.patch
Type: text/x-patch
Size: 2943 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050926/37abfb75/biopython-Doc_Makefile_fix.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biopdb_faq.tex.hevea-html-fix.patch
Type: text/x-patch
Size: 574 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050926/37abfb75/biopdb_faq.tex.hevea-html-fix.bin
From bill at barnard-engineering.com  Tue Sep 27 02:31:34 2005
From: bill at barnard-engineering.com (Bill Barnard)
Date: Tue Sep 27 03:54:32 2005
Subject: [BioPython] Patches to enable Doc building for source and rpm
	distributions
In-Reply-To: <1127779458.16589.60.camel@tioga.barnard-engineering.com>
References: <1127779458.16589.60.camel@tioga.barnard-engineering.com>
Message-ID: <1127802694.16589.78.camel@tioga.barnard-engineering.com>

On Mon, 2005-09-26 at 17:04 -0700, Bill Barnard wrote:

I found a couple mistakes in my patches I mailed earlier. These patches
supersede the earlier ones.

> I found that generating the pdf & html doc files to be straightforward
> except for two things:
> 
> 1) the hevea.sty file needs to be included in the distribution to allow
> the make to complete; that's simply added in the MANIFEST.in file.

I updated the Makefile to include generating text output, so I updated
MANIFEST.in to exclude Doc/Tutorial.txt (which was out of date relative
to Tutorial.tex.)

> 
> 2) Generating html from biopdb_faq.tex has a minor failure (resulting in
> a missing image file referenced in the html) due to lack of bounding box
> information in the .tex file. Since the .tex file is exported (I assume)
> from the .lyx file, I thought the "problem" should be fixed there. I did
> not figure it out, and didn't want to learn too much more about LyX in
> order to do so. Instead I created a patch file for the .tex file that
> permits proper html generation by hevea. The patch is applied in the
> Makefile.

Additional testing revealed my first patch only worked for generating
html from biopdf_faq.tex, and broke pdf and text generation. I've fixed
that. (It's still a hack though...)

> 
> The Makefile is actually entirely new. The old Makefile didn't have much
> in it, and was not very generic. The new one is a bit cleaner and more
> generic.

I made a couple mistakes in the Makefile. New html code was always
generated, even when unnecessary. Also it now invokes the clean target
when it completes, eliminating the need for one of the make calls in
setup.py.

> 
> I'm attaching the two patch files to this email. 

I've attached the two updated patches to this email. Please discard the
earlier email's patches.

Cheers,

Bill
-- 
Bill Barnard <bill@barnard-engineering.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biopython-Doc_Makefile_fix.patch
Type: text/x-patch
Size: 3260 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050926/e5b21d3e/biopython-Doc_Makefile_fix.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biopdb_faq.tex.hevea-html-fix.patch
Type: text/x-patch
Size: 1126 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050926/e5b21d3e/biopdb_faq.tex.hevea-html-fix.bin
From bill at barnard-engineering.com  Tue Sep 27 17:48:58 2005
From: bill at barnard-engineering.com (Bill Barnard)
Date: Tue Sep 27 17:50:20 2005
Subject: [BioPython] Patches to enable Doc building for source and rpm
	distributions
In-Reply-To: <1127802694.16589.78.camel@tioga.barnard-engineering.com>
References: <1127779458.16589.60.camel@tioga.barnard-engineering.com>
	<1127802694.16589.78.camel@tioga.barnard-engineering.com>
Message-ID: <1127857738.16589.94.camel@tioga.barnard-engineering.com>

I've completed an update of all the Makefiles in the Doc tree. The .tex
file patch previously discussed is attached so this email will have the
complete set of updates, but has not changed.

The Makefiles have all got some common functionality which has been
placed in a new file, common.mk, that lives at the Doc level in the
tree. Subsidiary Makefiles are very simple; they define their sets of
source files, the relative position of the Doc root, and include the
common.mk file.

Feel free to use any portion of these that seem useful. All three of
these files could be applied to the CVS tree as it is now.

Cheers,

Bill
-- 
Bill Barnard <bill@barnard-engineering.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biopython-Doc_Makefile_fix.patch
Type: text/x-patch
Size: 8082 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050927/c6aca0b4/biopython-Doc_Makefile_fix-0001.bin
-------------- next part --------------
#
# Define your sources, e.g. sources :=  biopython_test.tex
# then define docroot relative to your directory, e.g. docroot := ../..
# then include this file, e.g. include $(docroot)/common.mk
#

# output from pdflatex
pdfs = $(subst .tex,.pdf,$(sources))
auxs = $(subst .tex,.aux,$(sources))
logs = $(subst .tex,.log,$(sources))
outs = $(subst .tex,.out,$(sources))
tocs = $(subst .tex,.toc,$(sources))
# output from hevea
htmls = $(subst .tex,.html,$(sources))
hauxs = $(subst .tex,.haux,$(sources))
htocs = $(subst .tex,.htoc,$(sources))
txts  = $(subst .tex,.txt,$(sources))
#output from hacha
gifs := *_motif.gif

all:  html pdf txt clean
pdf:  $(pdfs)
html: $(htmls)
txt:  $(txts)

$(pdfs): %.pdf: %.tex
	export TEXINPUTS=:$(docroot) && pdflatex $<
	export TEXINPUTS=:$(docroot) && pdflatex $<
	export TEXINPUTS=:$(docroot) && pdflatex $<

$(htmls): %.html: %.tex $(patch_target)
	hevea -fix $<
	hevea -fix $<
	hacha -o $(basename $@)-index.html $@
	ln -s $@ $(basename $@)-one_page.html

$(txts): %.txt: %.tex
	hevea -fix -text -o $(basename $@).txt $<

.PHONY: clean
clean:
	rm -f $(auxs) $(logs) $(outs) $(tocs) $(hauxs) $(htocs)

.PHONY: distclean
distclean: clean
	rm -f $(pdfs) $(htmls) $(txts) *.html $(gifs) $(patch_target) *.rej
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biopdb_faq.tex.hevea-html-fix.patch
Type: text/x-patch
Size: 1126 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050927/c6aca0b4/biopdb_faq.tex.hevea-html-fix-0001.bin
From jstroud at mbi.ucla.edu  Thu Sep 29 16:34:18 2005
From: jstroud at mbi.ucla.edu (James Stroud)
Date: Thu Sep 29 17:08:03 2005
Subject: [BioPython] Smith Waterman
Message-ID: <200509291334.18053.jstroud@mbi.ucla.edu>

Hello Everyone,

I am brand new to the biopython mailing list but I've been a python programmer 
for some time. Anyway, I have what I think is a naieve question, but I have 
tried to google this and I can't for the life of my find a complete answer 
anywhere. Basically, I want to do a Smith-Waterman with a Blosum matrix. I 
think I have the latest distro of biopython.

I have found the pairwise2 module and determined that I need:

 pairwise2.align.localdd()

But I'm not sure where the Blosum Matrix is in the biopython API. Any 
suggestions? A code snippet would be a lot of help.

James

-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
From bill at barnard-engineering.com  Fri Sep 30 02:25:54 2005
From: bill at barnard-engineering.com (Bill Barnard)
Date: Fri Sep 30 22:47:15 2005
Subject: [BioPython] Smith Waterman
In-Reply-To: <200509291334.18053.jstroud@mbi.ucla.edu>
References: <200509291334.18053.jstroud@mbi.ucla.edu>
Message-ID: <1128061554.8794.8.camel@lyell.barnard-engineering.com>

On Thu, 2005-09-29 at 13:34 -0700, James Stroud wrote:

> But I'm not sure where the Blosum Matrix is in the biopython API. Any 
> suggestions? A code snippet would be a lot of help.

I'm just getting going on some sequence alignment code myself, exploring
what's in biopython. Anyway you'll find the matrices you're looking for
like so:

from Bio import SubsMat
from Bio.SubsMat import MatrixInfo
from Bio.pairwise2 import dictionary_match
blosum50 = dictionary_match(SubsMat.SeqMat(MatrixInfo.blosum50))

This gives you a dictionary interface to which you can pass two
characters and get back the Blosum50 score, e.g.

blosum50('A', 'P') returns: -1

Your characters need to be uppercase, and they need to part of the set
in the substitution matrix or you'll throw an exception.

There are other ways to access the Blosum matrices, but this one seemed
pretty nice to me.

Cheers,

Bill
-- 
Bill Barnard <bill@barnard-engineering.com>