From dag at sonsorol.org Fri Sep 2 10:00:09 2005 From: dag at sonsorol.org (Chris Dagdigian) Date: Fri Sep 2 09:49:33 2005 Subject: [BioPython] Fwd: An interface to the maximum-likelihood programs of PHYLIP References: <43184DEC.60600@student.cs.york.ac.uk> Message-ID: Begin forwarded message: > From: rjw500 > Date: September 2, 2005 9:04:44 AM EDT > To: biopython-dev-owner@biopython.org > Subject: An interface to the maximum-likelihood programs of PHYLIP > > > Dear Biopython-dev-owner, > > I am sorry to trouble you, but I am not sure who to contact, since > my inital e-mail to biopython-dev@biopython.org bounced back. > > I am studying for an MSc in Information Processing at York > University in the UK. As part of the course I am carrying out a > short research project with Dr. James Cussens. I chose to develop > an interface in Python to the maximum-likelihood programs of the > phylogenetic analysis package PHYLIP. The code is based on the > modules available in Biopython and I was wondering if you would be > interested in incorporating it into the next release of Biopython. > > I have written the following classes and modules: > > > - a class to represent a PHYLIP multiple sequence alignment based > on Bio.Align.Generic.Alignment > > - classes to represent the alphabets required for the sequence data > in PHYLIP input files based on Bio.Alphabet > > - a light class to allow the conversion of multiple sequence > alignment objects in other formats, such as Clustalw, that is > derived form Bio.Align.FormatConvert > > - a module to parse PHYLIP input files based on Bio.ParserSupport. > This module includes scanners to read the two PHYLIP input file > formats, a consumer and several parsers built upon these classes. > > - modules for the PHYLIP maximum-likelihood programs dnaml, dnamlk, > proml, promlk, as well as the PHYLIP programs seqboot, consense > and treedist that are based upon Bio.Application > > > These classes and modules allow the automation of phylogenetic > analysis, which particularly in the case of the maximum likelihood > methods can be a very time consuming process. A short script can be > written to analyse a multiple sequence alignment with one of the > maximum-likelihood programs, and then examine the tree produced by > bootstrapping to test if the relationships identified are supported > by the data. > > I realise Biopython currently provides support for the distance- > matrix programs of PHYLIP through the EMBOSS package wrappers. > However, wrappers for the PHYLIP maximum likelihood programs in the > EMBOSS package are either incomplete (dnaml and dnamlk), lacking > the facility to use multiple data sets which is critical for > bootstrapping, or completely absent (proml and promlk). Thus, I > decided to extend the existing support for PHYLIP by writing an > interface to the maximum likelihood programs of the standard PHYLIP > package. I wrote the modules for seqboot and consense, which are > available via the EMBOSS wrappers, so that people who had not > installed the EMBOSS package would also be able to carry out > bootstrap analysis. > > I look forward to hearing from you, > > Best wishes, > > Robert Wilson > From dalke at dalkescientific.com Sun Sep 4 19:50:03 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun Sep 4 19:52:27 2005 Subject: [BioPython] HMM module Message-ID: Hi all, I'm teaching a course here at the NBN. It's a successor to the course I taught 1.5 years ago. My lecture notes are at http://www.dalkescientific.com/writings/NBN/ The class is learning probabilistic modeling this week. I want to cover Markov models. Should I use Bio.BNN or Bio.MarkovModel and is there any documentation? Or is some 3rd party package better? Andrew dalke@dalkescientific.com From boehme at mpiib-berlin.mpg.de Mon Sep 5 05:10:39 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Sep 5 05:07:48 2005 Subject: [BioPython] Changes in NCBI BLAST output format In-Reply-To: <6CA15ADD82E5724F88CB53D50E61C9AE7AC20A@cgcmail.cgc.cpmc.columbia.edu> References: <6CA15ADD82E5724F88CB53D50E61C9AE7AC20A@cgcmail.cgc.cpmc.columbia.edu> Message-ID: <431C0B8F.9010807@mpiib-berlin.mpg.de> Hello Michiel, I got your fix for NCBIWWW.py from the CVS, but now I get a diffrent error message: SyntaxError: Line does not contain 'Database': . . . File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 47, in parse self._scanner.feed(handle, self._consumer) File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 100, in feed self._scan_header(uhandle, consumer) File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 172, in _scan_header self._scan_database_info(uhandle, consumer) File "C:\Python24\Lib\site-packages\Bio\Blast\NCBIWWW.py", line 190, in _scan_database_info read_and_call(uhandle, consumer.database_info, contains='Database') File "C:\Python24\Lib\site-packages\Bio\ParserSupport.py", line 301, in read_and_call raise SyntaxError, errmsg This is the sequence I blasted: GGAAGATAATGAACACCAAGA result_handle = NCBIWWW.qblast('blastn', 'nr', f_record, expect = 100, word_size = 7,filter = 'L',entrez_query = 'Homo sapiens [ORGN]',descriptions=10,alignments=50) (I added word_size to qblast, but that doesn't make any difference) I tried to workout what the problem was, but I don't find it that easy to see, what it is the blast parser is exactly doing. I'm wondering why nobody else seems to get this error? Thanks for any help! Martina From boehme at mpiib-berlin.mpg.de Tue Sep 6 11:26:27 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Tue Sep 6 11:17:53 2005 Subject: [BioPython] Changes in NCBI BLAST output format In-Reply-To: <200509052228.j85MSZ7Y022211@itsa.ucsf.edu> References: <200509052228.j85MSZ7Y022211@itsa.ucsf.edu> Message-ID: <431DB523.9020701@mpiib-berlin.mpg.de> Hello, shouldn't the suggestion by Alexander in the _Scanner class: " change: attempt_read_and_call(uhandle, consumer.noevent, start='

') to: attempt_read_and_call(uhandle, consumer.noevent) " be changed to something like: read_and_call_while(uhandle, consumer.noevent, blank=1)? At least, that is what is working in my case. This is because of the 2 different types of return files: one with the queries first, the other with the database info first. If it is database first, then there is an additional blank line which gave me problems. Martina meames@itsa.ucsf.edu wrote: > Yes, I have also observed a similar problem - > > The new BLAST output has an extra empty line between the "RID:" line > and the "Database:" line, which chokes the parser. A temporary (albeit > bad programming) solution would be to eliminate the line in the saved > BLAST output file before passing it on the parser. > > Matt > > On 5 Sep 2005 11:10:39 +0200 "Martina" wrote: > > >>Hello Michiel, >> >>I got your fix for NCBIWWW.py from the CVS, but now I get a diffrent >>error message: SyntaxError: Line does not contain 'Database': >>. . . From djkojeti at unity.ncsu.edu Tue Sep 6 11:16:41 2005 From: djkojeti at unity.ncsu.edu (Douglas Kojetin) Date: Tue Sep 6 11:21:39 2005 Subject: [BioPython] BioPDB: get_resname() example? Message-ID: Hi All- I've read in a PDB structure (see below), and would like to get the residue name for specific sequence positions (e.g. residue number 5). Can someone suggest to me how to do this? I cannot figure it out using the Structural Biopython FAQ. # ---> start from Bio.PDB import * pdb='file.pdb' p=PDBParser(); s=p.get_structure('THE_STRUCTURE', pdb) # i figured i can get the residue list using this res_list = Selection.unfold_entities(s, 'R') # but i'm not sure what to do next, or if there is a better way to get the information # what i would like to do is a get_resname(5), to get the residue type of residue 5 (e.g. ASP) # ---> end Thanks, Doug From thamelry at binf.ku.dk Tue Sep 6 11:23:44 2005 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Tue Sep 6 13:03:03 2005 Subject: [BioPython] BioPDB: get_resname() example? In-Reply-To: References: Message-ID: <200509061723.44414.thamelry@binf.ku.dk> On Tuesday 06 September 2005 17:16, Douglas Kojetin wrote: > Hi All- > > I've read in a PDB structure (see below), and would like to get the > residue name for specific sequence positions (e.g. residue number > 5). Can someone suggest to me how to do this? I cannot figure it > out using the Structural Biopython FAQ. > > # ---> start > > from Bio.PDB import * > > pdb='file.pdb' > p=PDBParser(); > s=p.get_structure('THE_STRUCTURE', pdb) Suppose you have one model (i.e. with model id=0), and a chain that is called 'A', then you do something like: r=s[0]['A'][5] print r.get_resname() Cheers, -Thomas From biopython at maubp.freeserve.co.uk Fri Sep 2 10:21:55 2005 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed Sep 7 05:46:47 2005 Subject: [BioPython] Fwd: An interface to the maximum-likelihood programs of PHYLIP In-Reply-To: References: <43184DEC.60600@student.cs.york.ac.uk> Message-ID: <43186003.90006@maubp.freeserve.co.uk> Robert Wilson wrote: > I am studying for an MSc in Information Processing at York University > in the UK. As part of the course I am carrying out a short research > project with Dr. James Cussens. I chose to develop an interface in > Python to the maximum-likelihood programs of the phylogenetic > analysis package PHYLIP. The code is based on the modules available > in Biopython and I was wondering if you would be interested in > incorporating it into the next release of Biopython. I did have a look at scripting PHYLIP some time ago, but the lack of command line arguments made this less than straight forward. Your work sounds very interesting, and should make a useful addition to BioPython. We will have to see what the maintainers make of it. Do you have a link to the code? Good documentation/examples would really be a bonus. Thanks Peter From idoerg at burnham.org Thu Sep 8 01:41:05 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Thu Sep 8 01:39:45 2005 Subject: [BioPython] python sidebar for mozilla/firefox Message-ID: Slightly offtopic, but some of you folks might like it. check this out: http://projects.edgewall.com/python-sidebar/ Thanks to Darek for Kedra for bringing this to my attention. ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037, USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo From chris.lasher at gmail.com Wed Sep 7 21:21:00 2005 From: chris.lasher at gmail.com (Chris Lasher) Date: Thu Sep 8 03:07:36 2005 Subject: [BioPython] Fwd: An interface to the maximum-likelihood programs of PHYLIP In-Reply-To: <43186003.90006@maubp.freeserve.co.uk> References: <43184DEC.60600@student.cs.york.ac.uk> <43186003.90006@maubp.freeserve.co.uk> Message-ID: <128a885f0509071821699ed9f5@mail.gmail.com> BioPython support for PHYLIP files would be most welcome to me! I use the PHYLIP programs frequently, and sometimes would like to do automated tasks with some of the infiles/outfiles. Automated scripting, as Peter pointed out, is made difficult by PHYLIP's menu-driven operation, but the program is open-source (under which liscense, I don't know), so perhaps this could be changed to CLI after some non-trivial work (and perhaps with Joe Felsenstein's permission). I'd also like to see the downfall of the 10-character sequence name limit in PHYLIP, but that's a gripe for another time... Do keep us posted, Robert. Chris From syd.diamond at gmail.com Sun Sep 11 03:57:11 2005 From: syd.diamond at gmail.com (Syd Diamond) Date: Sun Sep 11 10:18:25 2005 Subject: [BioPython] cpairwise2module.c setup.py fun-ness Message-ID: Good Sunday Morning Biopythoners, I'm not a hardcore bioinformatician per se, but in a project the need came along to find database matches for strings with typos. I turned to biopython, found the pairwise alignment function, and heuristically set the parameters to find good alphanumeric matches. This is not the point of the post, but I thought it was a cool application of biopython. Anyways, the c alignment module is significantly (2-4X) faster than the native python alignment module, and I was hoping to include this c module (cpairwise2) in a very low-key GPL distribution. I put in the biopython license file, and now I am trying to figure out a way to pare down the setup.py file to allow the user to compile the module. I'm in over my head here. My python hack0r sk1llz are intermediate, but my gcc skills leave much to be desired. I have no idea where to begin. In the biopython setup.py, I know cpairwise2 is simply one of the extensions. ** How should I reduce setup.py to only compile the cpairwise2 module?? ** Many thanks, yall. S From mdehollander at gmail.com Mon Sep 19 15:22:51 2005 From: mdehollander at gmail.com (Mattias de Hollander) Date: Mon Sep 19 15:47:04 2005 Subject: [BioPython] problems indexing a FASTA file Message-ID: I am trying to index a FASTA file with the following commands (found in the Cookbook): >>>from Bio import Fasta >>>dict_file = "sequences.fasta" >>>index_file = "sequences.idx" >>>Fasta.index_file(dict_file, index_file, rec2key=None) But i get the following error: Traceback (most recent call last): File "fasta_database.py", line 27, in ? main() File "fasta_database.py", line 21, in main Fasta.index_file(dict_file, index_file, rec2key=None) File "/usr/lib/python2.3/site-packages/Bio/Fasta/__init__.py", line 243, in index_file SimpleSeqRecord.create_flatdb([filename], indexname, indexer) File "/usr/lib/python2.3/site-packages/Bio/Mindy/SimpleSeqRecord.py", line 111, in create_flatdb creator = FlatDB.create(db_name, unique_name, alias_names) File "/usr/lib/python2.3/site-packages/Bio/Mindy/FlatDB.py", line 297, in create return open(dbname, "rw") File "/usr/lib/python2.3/site-packages/Bio/Mindy/FlatDB.py", line 304, in open return MemoryFlatDB(dbname) File "/usr/lib/python2.3/site-packages/Bio/Mindy/FlatDB.py", line 130, in __init__ BaseFlatDB.__init__(self, dbname, INDEX_TYPE) TypeError: __init__() takes exactly 2 arguments (3 given) What am i doing wrong? Thanks, Mattias de Hollander From no228 at cam.ac.uk Tue Sep 20 07:24:11 2005 From: no228 at cam.ac.uk (Noel O'Boyle) Date: Tue Sep 20 07:48:29 2005 Subject: [BioPython] SD/MDL file parser Message-ID: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk> Hello all, I've just been through the documentation and site-packages on my computer, and I cannot find a parser for SD (or MDL) files. This is the most common file format for chemical structures in databases of chemicals (as used by pharmaceutical companies, for example). Did I miss this parser? I know that Andrew Dalke (through PyDaylight) has an interest in chemistry, so I was expecting to find this parser... Regards, Noel O'Boyle. From j.pansanel at pansanel.net Tue Sep 20 08:17:59 2005 From: j.pansanel at pansanel.net (Jerome PANSANEL) Date: Tue Sep 20 08:37:54 2005 Subject: [BioPython] SD/MDL file parser In-Reply-To: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk> References: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk> Message-ID: <200509201418.00066.j.pansanel@pansanel.net> Le Mardi 20 Septembre 2005 13:24, Noel O'Boyle a ?crit?: > Hello all, Hi ! > I've just been through the documentation and site-packages on my > computer, and I cannot find a parser for SD (or MDL) files. This is the > most common file format for chemical structures in databases of > chemicals (as used by pharmaceutical companies, for example). > > Did I miss this parser? I know that Andrew Dalke (through PyDaylight) > has an interest in chemistry, so I was expecting to find this parser... You can use frowns (http://frowns.sourceforge.net/) in Python or if you need C++, I can send you one we have developped. What do you need for features ? Regards, Jerome Pansanel > Regards, > Noel O'Boyle. > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From omid9dr18 at hotmail.com Sat Sep 24 20:54:46 2005 From: omid9dr18 at hotmail.com (Omid Khalouei) Date: Sat Sep 24 21:14:06 2005 Subject: [BioPython] Torsional angle Message-ID: Hello, Could someone please help me with measuring a torsional angle given the PDB coordinates of the 4 atoms involved in it. Thanks alot. From dalke at dalkescientific.com Mon Sep 26 09:13:27 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon Sep 26 09:25:32 2005 Subject: [BioPython] SD/MDL file parser In-Reply-To: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk> References: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk> Message-ID: <9d932c735619301486d2c02155404ff8@dalkescientific.com> Hi Noel, > I've just been through the documentation and site-packages on my > computer, and I cannot find a parser for SD (or MDL) files. This is the > most common file format for chemical structures in databases of > chemicals (as used by pharmaceutical companies, for example). > > Did I miss this parser? I know that Andrew Dalke (through PyDaylight) > has an interest in chemistry, so I was expecting to find this parser... As Jerome mentioned, frowns includes an MDL parser. >>> from frowns import MDL >>> filename = "/usr/local/openeye/python/examples/oechem/examples/drugs.sdf" >>> for mol, error, text in MDL.sdin(open(filename)): ... print mol.cansmiles(), mol.fields ... C(c1c(OC(=O)C)cccc1)(=O)O {'Color': 'red', 'Energy': '1'} c12C(=O)NC(=Nc1[n](cn2)COCCO)N {'Color': 'blue', 'Energy': '2'} c1(c(cccc1)CC=C)OCC(O)CNC(C)C {'Color': 'green', 'Energy': '3'} C1(C(N(c2ccccc2)N(C=1C)C)=O)N(C)C {'Energy': '4.5'} c1(OCC(O)CNC(C)C)ccc(cc1)CC(=O)N {'Color': 'purple', 'Energy': '-3.5'} C1(c2c(N(C(N1C)=O)C)nc[n]2C)=O {'Color': 'black', 'Energy': '0'} >>> It converts the connection table data into a Frowns data structure. It should keep the chemistry the same as what's in the file because the frowns.Molecule doesn't do any perception, but doing something like cansmiles() will likely change things. If you have SD fields with repeats of the same key then there will be a problem, because the parser expects that the data can be stored in a dictionary. OEChem has a dictionary-like data structure which also allows list-like iteration for this case. If I had the time (okay, and if someone was willing to pay for me to do this :) I would probably use something like my MultiDict class instead. Your email says you're in Cambridge, eh? I'll be there in a couple of weeks for the EuroMUG conference, staying there for a week to also visit EBI and Sanger. Andrew dalke@dalkescientific.com From no228 at cam.ac.uk Mon Sep 26 09:27:58 2005 From: no228 at cam.ac.uk (Noel O'Boyle) Date: Mon Sep 26 09:28:19 2005 Subject: [BioPython] SD/MDL file parser In-Reply-To: <9d932c735619301486d2c02155404ff8@dalkescientific.com> References: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk> <9d932c735619301486d2c02155404ff8@dalkescientific.com> Message-ID: <1127741278.16760.213.camel@sandwi.ch.cam.ac.uk> > If you have SD fields with repeats of the same key then > there will be a problem, because the parser expects that > the data can be stored in a dictionary. OEChem has a > dictionary-like data structure which also allows list-like > iteration for this case. > If I had the time (okay, and if someone was willing to pay > for me to do this :) I would probably use something like my > MultiDict class instead. The whole frowns system is a bit overkill for simple manipulations of SD fields. I am planning to cannabalise the mdl parser and writer of frowns to make it more straightforward for myself. Thanks for the heads up regarding multiple fields, though. > Your email says you're in Cambridge, eh? I'll be there > in a couple of weeks for the EuroMUG conference, staying > there for a week to also visit EBI and Sanger. I'll probably see you there so, as I'll be attending. Is it possible/worthwhile to write a parser using Martel? Or would you say that there are too many non-standard sd files out there for a single parser to have general applicability? Regards, Noel -- Dr. Noel M. O'Boyle, Group of Dr. John Mitchell (http://www-mitchell.ch.cam.ac.uk), Unilever Centre for Molecular Science Informatics, Dept. of Chemistry, University of Cambridge, U.K. From dalke at dalkescientific.com Mon Sep 26 10:09:06 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon Sep 26 10:23:13 2005 Subject: [BioPython] Torsional angle In-Reply-To: References: Message-ID: Hi Omid, > Could someone please help me with measuring a torsional angle given > the PDB coordinates of the 4 atoms involved in it. See the thread including http://www.rcsb.org/pdb/lists/pdb-l/200409/001936.html and more details in http://www.math.fsu.edu/~quine/IntroMathBio_04/torsion_pdb/ torsion_pdb.pdf There is Biopython code for computing a torsional angle. >>> from Bio.PDB import Vector >>> p1 = Vector.Vector( [0.0, 0.0, 1.0] ) >>> p2 = Vector.Vector( [0.0, 0.0, 0.0] ) >>> p3 = Vector.Vector( [0.0, 1.0, 0.0] ) >>> p4 = Vector.Vector( [1.0, 1.0, 0.0] ) >>> Vector.calc_dihedral(p1, p2, p3, p4) 1.5707963267948966 >>> However, I think that's only in CVS. The code is def calc_dihedral(v1, v2, v3, v4): """ Calculate the dihedral angle between 4 vectors representing 4 connected points. The angle is in ]-pi, pi]. @param v1, v2, v3, v4: the four points that define the dihedral angle @type v1, v2, v3, v4: L{Vector} """ ab=v1-v2 cb=v3-v2 db=v4-v3 u=ab**cb v=db**cb w=u**v angle=u.angle(v) # Determine sign of angle try: if cb.angle(w)>0.001: angle=-angle except ZeroDivisionError: # dihedral=pi pass return angle This depends on a Bio.PDB-specific Vector class which implements the "angle" method. def angle(self, other): "Return angle between two vectors" n1=self.norm() n2=other.norm() c=(self*other)/(n1*n2) # Take care of roundoff errors c=min(c,1) c=max(-1,c) return arccos(c) Andrew dalke@dalkescientific.com From dalke at dalkescientific.com Mon Sep 26 10:39:34 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon Sep 26 10:39:20 2005 Subject: [BioPython] SD/MDL file parser In-Reply-To: <1127741278.16760.213.camel@sandwi.ch.cam.ac.uk> References: <1127215451.16760.92.camel@sandwi.ch.cam.ac.uk> <9d932c735619301486d2c02155404ff8@dalkescientific.com> <1127741278.16760.213.camel@sandwi.ch.cam.ac.uk> Message-ID: <518860cf0035743064d2cb8b8fed39c4@dalkescientific.com> Hi Noel, > The whole frowns system is a bit overkill for simple manipulations of > SD > fields. I am planning to cannabalise the mdl parser and writer of > frowns > to make it more straightforward for myself. Thanks for the heads up > regarding multiple fields, though. The question is, what's overkill and what's not? If you only want the SD data then you can leave the connection table as an unprocessed blob of text. In that case a parse is about 25 lines of code. > I'll probably see you there so, as I'll be attending. I'll see you then in a couple of weeks. > Is it possible/worthwhile to write a parser using Martel? Or would you > say that there are too many non-standard sd files out there for a > single > parser to have general applicability? When I wrote Martel I tested against an SD file ... and an RXN file, and a molfile. Somewhere is a Martel expression for those formats. It's not on this laptop so I need to check my older one. That's home in Santa Fe and I can't log into that machine now since I'm traveling. If you can dig up an old Martel distribution (pre-Biopython integration) you might be able to get ahold of it. I don't think the diversity of an SD file format is that large. After all, many programs have SD file readers and I don't hear anywhere near as many problems as with supporting, say, the PDB format. Still, if you only need the SD data reader then it isn't too hard to do yourself, and there are few ways to make subtle mistakes. Andrew dalke@dalkescientific.com From bill at barnard-engineering.com Mon Sep 26 20:04:18 2005 From: bill at barnard-engineering.com (Bill Barnard) Date: Mon Sep 26 20:29:17 2005 Subject: [BioPython] Patches to enable Doc building for source and rpm distributions Message-ID: <1127779458.16589.60.camel@tioga.barnard-engineering.com> I've got some time free to work with Biopython again. I wanted to be able to easily create an rpm including the Doc directory. I found that the Makefile in the Doc directory didn't work quite correctly, so I set out to remedy that. My set of changes are small, and you may not want to commit all of them. In particular I added two lines to setup.py to run make in the Doc directory. There may be some better way to accomplish that task, but I don't know distutils very well so I'm not a good judge. Also these are only "needed" to enable the addition of the doc files in the rpm generated by setup.py bdist_rpm. I found that generating the pdf & html doc files to be straightforward except for two things: 1) the hevea.sty file needs to be included in the distribution to allow the make to complete; that's simply added in the MANIFEST.in file. 2) Generating html from biopdb_faq.tex has a minor failure (resulting in a missing image file referenced in the html) due to lack of bounding box information in the .tex file. Since the .tex file is exported (I assume) from the .lyx file, I thought the "problem" should be fixed there. I did not figure it out, and didn't want to learn too much more about LyX in order to do so. Instead I created a patch file for the .tex file that permits proper html generation by hevea. The patch is applied in the Makefile. The Makefile is actually entirely new. The old Makefile didn't have much in it, and was not very generic. The new one is a bit cleaner and more generic. I'm attaching the two patch files to this email. I'll see when it comes back to me whether anything in my email chain has stripped the patch files from the email. I hope this is useful. If it is, then I may visit the subdirectories of Doc to structure their Makefiles similarly, and set them up to be called recursively. Cheers, Bill -- Bill Barnard -------------- next part -------------- A non-text attachment was scrubbed... Name: biopython-Doc_Makefile_fix.patch Type: text/x-patch Size: 2943 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050926/37abfb75/biopython-Doc_Makefile_fix.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: biopdb_faq.tex.hevea-html-fix.patch Type: text/x-patch Size: 574 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050926/37abfb75/biopdb_faq.tex.hevea-html-fix.bin From bill at barnard-engineering.com Tue Sep 27 02:31:34 2005 From: bill at barnard-engineering.com (Bill Barnard) Date: Tue Sep 27 03:54:32 2005 Subject: [BioPython] Patches to enable Doc building for source and rpm distributions In-Reply-To: <1127779458.16589.60.camel@tioga.barnard-engineering.com> References: <1127779458.16589.60.camel@tioga.barnard-engineering.com> Message-ID: <1127802694.16589.78.camel@tioga.barnard-engineering.com> On Mon, 2005-09-26 at 17:04 -0700, Bill Barnard wrote: I found a couple mistakes in my patches I mailed earlier. These patches supersede the earlier ones. > I found that generating the pdf & html doc files to be straightforward > except for two things: > > 1) the hevea.sty file needs to be included in the distribution to allow > the make to complete; that's simply added in the MANIFEST.in file. I updated the Makefile to include generating text output, so I updated MANIFEST.in to exclude Doc/Tutorial.txt (which was out of date relative to Tutorial.tex.) > > 2) Generating html from biopdb_faq.tex has a minor failure (resulting in > a missing image file referenced in the html) due to lack of bounding box > information in the .tex file. Since the .tex file is exported (I assume) > from the .lyx file, I thought the "problem" should be fixed there. I did > not figure it out, and didn't want to learn too much more about LyX in > order to do so. Instead I created a patch file for the .tex file that > permits proper html generation by hevea. The patch is applied in the > Makefile. Additional testing revealed my first patch only worked for generating html from biopdf_faq.tex, and broke pdf and text generation. I've fixed that. (It's still a hack though...) > > The Makefile is actually entirely new. The old Makefile didn't have much > in it, and was not very generic. The new one is a bit cleaner and more > generic. I made a couple mistakes in the Makefile. New html code was always generated, even when unnecessary. Also it now invokes the clean target when it completes, eliminating the need for one of the make calls in setup.py. > > I'm attaching the two patch files to this email. I've attached the two updated patches to this email. Please discard the earlier email's patches. Cheers, Bill -- Bill Barnard -------------- next part -------------- A non-text attachment was scrubbed... Name: biopython-Doc_Makefile_fix.patch Type: text/x-patch Size: 3260 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050926/e5b21d3e/biopython-Doc_Makefile_fix.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: biopdb_faq.tex.hevea-html-fix.patch Type: text/x-patch Size: 1126 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050926/e5b21d3e/biopdb_faq.tex.hevea-html-fix.bin From bill at barnard-engineering.com Tue Sep 27 17:48:58 2005 From: bill at barnard-engineering.com (Bill Barnard) Date: Tue Sep 27 17:50:20 2005 Subject: [BioPython] Patches to enable Doc building for source and rpm distributions In-Reply-To: <1127802694.16589.78.camel@tioga.barnard-engineering.com> References: <1127779458.16589.60.camel@tioga.barnard-engineering.com> <1127802694.16589.78.camel@tioga.barnard-engineering.com> Message-ID: <1127857738.16589.94.camel@tioga.barnard-engineering.com> I've completed an update of all the Makefiles in the Doc tree. The .tex file patch previously discussed is attached so this email will have the complete set of updates, but has not changed. The Makefiles have all got some common functionality which has been placed in a new file, common.mk, that lives at the Doc level in the tree. Subsidiary Makefiles are very simple; they define their sets of source files, the relative position of the Doc root, and include the common.mk file. Feel free to use any portion of these that seem useful. All three of these files could be applied to the CVS tree as it is now. Cheers, Bill -- Bill Barnard -------------- next part -------------- A non-text attachment was scrubbed... Name: biopython-Doc_Makefile_fix.patch Type: text/x-patch Size: 8082 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050927/c6aca0b4/biopython-Doc_Makefile_fix-0001.bin -------------- next part -------------- # # Define your sources, e.g. sources := biopython_test.tex # then define docroot relative to your directory, e.g. docroot := ../.. # then include this file, e.g. include $(docroot)/common.mk # # output from pdflatex pdfs = $(subst .tex,.pdf,$(sources)) auxs = $(subst .tex,.aux,$(sources)) logs = $(subst .tex,.log,$(sources)) outs = $(subst .tex,.out,$(sources)) tocs = $(subst .tex,.toc,$(sources)) # output from hevea htmls = $(subst .tex,.html,$(sources)) hauxs = $(subst .tex,.haux,$(sources)) htocs = $(subst .tex,.htoc,$(sources)) txts = $(subst .tex,.txt,$(sources)) #output from hacha gifs := *_motif.gif all: html pdf txt clean pdf: $(pdfs) html: $(htmls) txt: $(txts) $(pdfs): %.pdf: %.tex export TEXINPUTS=:$(docroot) && pdflatex $< export TEXINPUTS=:$(docroot) && pdflatex $< export TEXINPUTS=:$(docroot) && pdflatex $< $(htmls): %.html: %.tex $(patch_target) hevea -fix $< hevea -fix $< hacha -o $(basename $@)-index.html $@ ln -s $@ $(basename $@)-one_page.html $(txts): %.txt: %.tex hevea -fix -text -o $(basename $@).txt $< .PHONY: clean clean: rm -f $(auxs) $(logs) $(outs) $(tocs) $(hauxs) $(htocs) .PHONY: distclean distclean: clean rm -f $(pdfs) $(htmls) $(txts) *.html $(gifs) $(patch_target) *.rej -------------- next part -------------- A non-text attachment was scrubbed... Name: biopdb_faq.tex.hevea-html-fix.patch Type: text/x-patch Size: 1126 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20050927/c6aca0b4/biopdb_faq.tex.hevea-html-fix-0001.bin From jstroud at mbi.ucla.edu Thu Sep 29 16:34:18 2005 From: jstroud at mbi.ucla.edu (James Stroud) Date: Thu Sep 29 17:08:03 2005 Subject: [BioPython] Smith Waterman Message-ID: <200509291334.18053.jstroud@mbi.ucla.edu> Hello Everyone, I am brand new to the biopython mailing list but I've been a python programmer for some time. Anyway, I have what I think is a naieve question, but I have tried to google this and I can't for the life of my find a complete answer anywhere. Basically, I want to do a Smith-Waterman with a Blosum matrix. I think I have the latest distro of biopython. I have found the pairwise2 module and determined that I need: pairwise2.align.localdd() But I'm not sure where the Blosum Matrix is in the biopython API. Any suggestions? A code snippet would be a lot of help. James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ From bill at barnard-engineering.com Fri Sep 30 02:25:54 2005 From: bill at barnard-engineering.com (Bill Barnard) Date: Fri Sep 30 22:47:15 2005 Subject: [BioPython] Smith Waterman In-Reply-To: <200509291334.18053.jstroud@mbi.ucla.edu> References: <200509291334.18053.jstroud@mbi.ucla.edu> Message-ID: <1128061554.8794.8.camel@lyell.barnard-engineering.com> On Thu, 2005-09-29 at 13:34 -0700, James Stroud wrote: > But I'm not sure where the Blosum Matrix is in the biopython API. Any > suggestions? A code snippet would be a lot of help. I'm just getting going on some sequence alignment code myself, exploring what's in biopython. Anyway you'll find the matrices you're looking for like so: from Bio import SubsMat from Bio.SubsMat import MatrixInfo from Bio.pairwise2 import dictionary_match blosum50 = dictionary_match(SubsMat.SeqMat(MatrixInfo.blosum50)) This gives you a dictionary interface to which you can pass two characters and get back the Blosum50 score, e.g. blosum50('A', 'P') returns: -1 Your characters need to be uppercase, and they need to part of the set in the substitution matrix or you'll throw an exception. There are other ways to access the Blosum matrices, but this one seemed pretty nice to me. Cheers, Bill -- Bill Barnard