From eirik.sonneland at student.umb.no Tue Mar 1 03:43:47 2005 From: eirik.sonneland at student.umb.no (=?ISO-8859-1?Q?Eirik_S=F8nneland?=) Date: Tue Mar 1 03:38:58 2005 Subject: [BioPython] Human genome Message-ID: <42242B43.4000100@student.umb.no> Hi! Need to Blast my sequence file against the human genome. Have tried to search http://www.ncbi.nlm.nih.gov/BLAST/docs/netblast.html for finding this database to put into the following code (as second argument): NCBIWWW.qblast('blastn', '???', f_record), Have anybody knowledge of the name of this parameter ?? Would be highly appreciated! cheers, Eirik From eirik.sonneland at student.umb.no Tue Mar 1 04:34:38 2005 From: eirik.sonneland at student.umb.no (=?ISO-8859-1?Q?Eirik_S=F8nneland?=) Date: Tue Mar 1 04:30:34 2005 Subject: [BioPython] Human genome In-Reply-To: <42242EE4.4080101@mpiib-berlin.mpg.de> References: <42242B43.4000100@student.umb.no> <42242EE4.4080101@mpiib-berlin.mpg.de> Message-ID: <4224372E.1040102@student.umb.no> Hi & thanks for replying! Yes, I'm aware of the cookbook and the links to NCBI, but it doesn't seem to be any database which is only for human, as there is for forexample "Drosophila genom". Interesting aspect that you've changed to standalone blast and that it isn't faster...hmmm, regards Eirik DroMartina wrote: > Hallo Eirik, > > have a look at the cookbook: > http://www.biopython.org/docs/tutorial/Tutorial004.html#toc10 > > I just tried to do the same, but each query took 1-2min. So I switched > to standalone Blast, but its not faster yet. > > Martina > > Eirik S?nneland wrote: > >> Hi! >> >> Need to Blast my sequence file against the human genome. Have tried >> to search http://www.ncbi.nlm.nih.gov/BLAST/docs/netblast.html for >> finding this database to put into the following code (as second >> argument): >> >> NCBIWWW.qblast('blastn', '???', f_record), >> >> Have anybody knowledge of the name of this parameter ?? >> >> Would be highly appreciated! >> >> cheers, >> Eirik >> _______________________________________________ >> BioPython mailing list - BioPython@biopython.org >> http://biopython.org/mailman/listinfo/biopython > From idoerg at burnham.org Tue Mar 1 20:47:28 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Tue Mar 1 20:42:22 2005 Subject: [BioPython] PDB import thingy? Message-ID: <42251B30.2030306@burnham.org> Hi, I tried this PDB import (it's part of som really old code I have), and got the following: Python 2.3.3 (#2, Feb 17 2004, 11:45:40) [GCC 3.3.2 (Mandrake Linux 10.0 3.3.2-6mdk)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from PDBUtil import PDBUtil Traceback (most recent call last): File "", line 1, in ? File "/home/idoerg/soft/PDBUtil/PDBUtil.py", line 7, in ? from Bio.PDB.PDBParser import PDBParser File "/home/idoerg/biopy_cvs/biopython/build/lib.linux-i686-2.3/Bio/PDB/__init__.py", line 13, in ? from MMCIFParser import MMCIFParser File "/home/idoerg/biopy_cvs/biopython/build/lib.linux-i686-2.3/Bio/PDB/MMCIFParser.py", line 6, in ? from MMCIF2Dict import MMCIF2Dict File "/home/idoerg/biopy_cvs/biopython/build/lib.linux-i686-2.3/Bio/PDB/MMCIF2Dict.py", line 2, in ? import Bio.PDB.mmCIF.MMCIFlex ImportError: No module named MMCIFlex Curiously enough, the second import works. >>> from PDBUtil import PDBUtil >>> Huh? ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 http://ffas.ljcrf.edu/~iddo ========================== The First Automated Protein Function Prediction SIG Detroit, MI June 24, 2005 http://ffas.burnham.org/AFP From fkauff at duke.edu Wed Mar 2 09:04:09 2005 From: fkauff at duke.edu (Frank Kauff) Date: Wed Mar 2 08:59:00 2005 Subject: [BioPython] Human genome In-Reply-To: <4224372E.1040102@student.umb.no> References: <42242B43.4000100@student.umb.no> <42242EE4.4080101@mpiib-berlin.mpg.de> <4224372E.1040102@student.umb.no> Message-ID: <1109772250.5116.5.camel@osiris.biology.duke.edu> On Tue, 2005-03-01 at 10:34 +0100, Eirik S?nneland wrote: > Hi & thanks for replying! > > Yes, I'm aware of the cookbook and the links to NCBI, but it doesn't > seem to be any database which is only for human, as there is for > forexample "Drosophila genom". > Interesting aspect that you've changed to standalone blast and that it > isn't faster...hmmm, > Standalone blast needs a lot of resources on your local computer, of which often memory is the bottleneck. Without plenty of memory, blast can't keep the database in ram and the search takes quite a long time. The nt database on a local machine with less than a few GB of ram can be rather stressful for your harddisk. NCBI on the other side can be very fast, e.g. if you're blasting during the (US) night or early morning. Frank > regards > Eirik > > DroMartina wrote: > > > Hallo Eirik, > > > > have a look at the cookbook: > > http://www.biopython.org/docs/tutorial/Tutorial004.html#toc10 > > > > I just tried to do the same, but each query took 1-2min. So I switched > > to standalone Blast, but its not faster yet. > > > > Martina > > > > Eirik S?nneland wrote: > > > >> Hi! > >> > >> Need to Blast my sequence file against the human genome. Have tried > >> to search http://www.ncbi.nlm.nih.gov/BLAST/docs/netblast.html for > >> finding this database to put into the following code (as second > >> argument): > >> > >> NCBIWWW.qblast('blastn', '???', f_record), > >> > >> Have anybody knowledge of the name of this parameter ?? > >> > >> Would be highly appreciated! > >> > >> cheers, > >> Eirik > >> _______________________________________________ > >> BioPython mailing list - BioPython@biopython.org > >> http://biopython.org/mailman/listinfo/biopython > > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 Web http://www.lutzonilab.net/member/frankkauff.shtml From thamelry at binf.ku.dk Wed Mar 2 10:41:39 2005 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Wed Mar 2 12:35:46 2005 Subject: [BioPython] PDB import thingy? In-Reply-To: <42251B30.2030306@burnham.org> References: <42251B30.2030306@burnham.org> Message-ID: <35879.83.92.3.59.1109778099.squirrel@www.binf.ku.dk> > ImportError: No module named MMCIFlex See if the C/Flex module is compiled in Bio/PDB/mmCIF. Maybe this is another of my modules that should be disabled by default, this time because because it depends on Flex. I should really stop using 'fancy' stuff like C++ or lex, apparantly :-) Cheers, -Thomas From scott.rifkin at yale.edu Fri Mar 4 17:15:08 2005 From: scott.rifkin at yale.edu (Scott Rifkin) Date: Fri Mar 4 19:53:40 2005 Subject: [BioPython] kcluster and distances Message-ID: The euclidean distance function in cluster.c is: { double result = 0.; double tweight = 0; int i; if (transpose==0) /* Calculate the distance between two rows */ { for (i = 0; i < n; i++) { if (mask1[index1][i] && mask2[index2][i]) { double term = data1[index1][i] - data2[index2][i]; result = result + weight[i]*term*term; tweight += weight[i]; } } } else { for (i = 0; i < n; i++) { if (mask1[i][index1] && mask2[i][index2]) { double term = data1[i][index1] - data2[i][index2]; result = result + weight[i]*term*term; tweight += weight[i]; } } } if (!tweight) return 0; /* usually due to empty clusters */ result /= tweight; result *= n; return result; } why at the end is the result multiplied by n? and why isn't the square root of result given as the distance? thanks scott rifkin From mdehoon at ims.u-tokyo.ac.jp Sat Mar 5 00:05:49 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 5 00:01:02 2005 Subject: [BioPython] kcluster and distances In-Reply-To: References: Message-ID: <42293E2D.7080409@ims.u-tokyo.ac.jp> Scott Rifkin wrote: > The euclidean distance function in cluster.c is: > > { double result = 0.; > ... > result /= tweight; > result *= n; > return result; > } > > why at the end is the result multiplied by n? Typically, all the weights are one. Then tweight is equal to n, and the result is equal to the usual definition of the Euclidean distance. In the latest version of the C Clustering Library (which is not yet uploaded to the Biopython CVS), I removed the multiplication by n. The euclid function then returns the mean square distance, which may be easier to interpret. > and why isn't the square root of result given as the distance? Taking the square root adds another calculation step, but won't affect the (hierarchical) clustering result. So we may as well leave it out. If desired, users can take the square root of the node distances after the clustering calculation has finished. Within the context of k-means clustering, not taking the square root is actually the right thing, as we want to minimize the sum of square distances. --Michiel. From mdehoon at ims.u-tokyo.ac.jp Mon Mar 7 06:19:28 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 7 06:19:29 2005 Subject: [BioPython] Removing spam from the biopython-dev archives Message-ID: <422C38C0.6060300@ims.u-tokyo.ac.jp> Dear Biopythoneers, I have good news and bad news for you. The good news is that the biopython-dev mailing list archives are now spam-free. See http://www.biopython.org/pipermail/biopython-dev/ If you sent a message to biopython-dev during the past couple of days, it may be missing in the archives. If you find out that a message you sent is missing, please send it again to make sure that it is preserved for future generations of Biopythoneers. The bad news is that when the biopython-dev archive was replaced on the biopython.org webserver, the source file for the biopython archive was accidentally overwritten. The biopython archive is still there, but the source file from which the archives are generated is not. There is a backup file until about March 2004, but for messages between March 2004 and early March 2005 we no longer have the source file. If you happen to have a copy of these messages, please let us know. --Michiel. From wong at ebgm.jussieu.fr Mon Mar 7 09:51:36 2005 From: wong at ebgm.jussieu.fr (WONG Hua) Date: Mon Mar 7 09:46:30 2005 Subject: [BioPython] non root having problem to install biopython 1.40 Message-ID: <20050307145136.GA7691@bach.ebgm.jussieu.fr> I am trying to install the new version of biopython 1.40 I am surprised to see that the "Numeric" tweaks I did on the setup.py doesn't seems to work :( Here is the line: NUMPY_EXTENSIONS = [ Extension('Bio.Cluster.cluster', ['Bio/Cluster/clustermodule.c', 'Bio/Cluster/cluster.c', 'Bio/Cluster/ranlib.c', 'Bio/Cluster/com.c', 'Bio/Cluster/linpack.c'], include_dirs=["Bio/Cluster","users/invites/wong/module/include/python2.3/Numeric"] ), And the error: running build_ext building 'Bio.Cluster.cluster' extension gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe -m32 -march=i386 -mtune=pentium4 -D_GNU_SOURCE -fPIC -fPIC -IBio/Cluster -Iusers/invites/wong/module/include/python2.3 -I/usr/include/python2.3 -c Bio/Cluster/clustermodule.c -o build/temp.linux-i686-2.3/Bio/Cluster/clustermodule.o Bio/Cluster/clustermodule.c:2:33: Numeric/arrayobject.h: No such file or directory ...followed by various Bio/Cluster/clustermodule.c: errors... Although I modified this line, it still can't find Numeric. Worst is that in fact I don't even need or use the cluster and trees feature... I am mostly interested in the Bio.PDB part. Numeric is properly installed (not in root directories, but it is installed). "Import Numeric" works fine when running Python. Is there a way I can shortcircuit this? Else, what have I forgot to do in order to point where Numeric is? Thanks Wong Hua ##################################################################### 6 Notes for installing with non-administrator permissions # That's exactly me Building some C modules, such as Bio.Cluster require that the Numeric include files (normally installed in your_dir/include/python/Numeric) be available. If the compiler can't find these directories you'll normally get an error like: Bio/Cluster/clustermodule.c:2: Numeric/arrayobject.h: No such file or directory Followed by a long messy list of syntax errors. ##Exact To fix this, you'll have to edit the setup.py file to let it know where the include directories are located. Look for the line in setup.py that looks like: include_dirs=["Bio/Cluster"] and adjust it so that it includes the include directory where the numeric libraries were installed: include_dirs=["Bio/Cluster", "your_dir/include/python"] Then you should be able to install everything happily. # No happy endings :( From jtk at cmp.uea.ac.uk Mon Mar 7 12:02:45 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Mon Mar 7 11:00:47 2005 Subject: [BioPython] Minor GenBank Parsing Problems Message-ID: <20050307170245.GE32158@jtkpc.cmp.uea.ac.uk> Dear All, I have noticed the following problems with the Bio.GenBank.FeatureParser: * The parser appears to depend on additional information in the LOCUS line, it works with LOCUS U00096 4639675 bp DNA circular BCT 24-JUN-2004 while the undecorated line LOCUS U00096 results in a Martel.Parser.ParserPositionException. * The parser also doesn't like some accession types, the line ACCESSION U00096 AE000111-AE000510 while trimming that to ACCESSION U00096 results in a file that now parses ok. Are these known problems? I would assume that this could be fixed by modifying some regular expressions somewhere in the parser. Should I try to look into this? Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk@cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* From thamelry at binf.ku.dk Mon Mar 7 09:57:51 2005 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Mon Mar 7 11:59:47 2005 Subject: [BioPython] non root having problem to install biopython 1.40 In-Reply-To: <20050307145136.GA7691@bach.ebgm.jussieu.fr> References: <20050307145136.GA7691@bach.ebgm.jussieu.fr> Message-ID: <33815.83.92.3.59.1110207471.squirrel@www.binf.ku.dk> > "users/invites/wong/module/include/python2.3/Numeric" Shouldn't there be a "/" in the beginning? Cheers, -Thomas From mdehoon at ims.u-tokyo.ac.jp Mon Mar 7 23:20:45 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Mar 7 23:16:00 2005 Subject: [BioPython] non root having problem to install biopython 1.40 In-Reply-To: <20050307145136.GA7691@bach.ebgm.jussieu.fr> References: <20050307145136.GA7691@bach.ebgm.jussieu.fr> Message-ID: <422D281D.7020707@ims.u-tokyo.ac.jp> WONG Hua wrote: > NUMPY_EXTENSIONS = [ > Extension('Bio.Cluster.cluster', > ['Bio/Cluster/clustermodule.c', > 'Bio/Cluster/cluster.c', > 'Bio/Cluster/ranlib.c', > 'Bio/Cluster/com.c', > 'Bio/Cluster/linpack.c'], > include_dirs=["Bio/Cluster","users/invites/wong/module/include/python2.3/Numeric"] > ), > > And the error: > > Bio/Cluster/clustermodule.c:2:33: Numeric/arrayobject.h: No such file or directory > clustermodule.c tries to import Numeric/arrayobject.h, not arrayobject.h. So you should remove the Numeric in users/invites/wong/module/include/python2.3/Numeric, otherwise you'll end up with one Numeric too many. --Michiel. From mdehoon at ims.u-tokyo.ac.jp Tue Mar 8 23:53:02 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Tue Mar 8 23:47:48 2005 Subject: [BioPython] Removing spam from the biopython-dev archives In-Reply-To: <422C38C0.6060300@ims.u-tokyo.ac.jp> References: <422C38C0.6060300@ims.u-tokyo.ac.jp> Message-ID: <422E812E.1010903@ims.u-tokyo.ac.jp> > The bad news is that when the biopython-dev archive was replaced on the > biopython.org webserver, the source file for the biopython archive was > accidentally overwritten. The biopython archive is still there, but the > source file from which the archives are generated is not. There is a > backup file until about March 2004, but for messages between March 2004 > and early March 2005 we no longer have the source file. If you happen to > have a copy of these messages, please let us know. Bartek Wilczynski sent me his copy of the Biopython messages since mid August 2004, which allowed me to recover all messages since then except for one message in September. From March 2004 to mid August 2004, there is still a gap. So if you have saved some of those messages, please let me know. Note that Netscape and Mozilla don't actually delete a message from the source file until you compact the mail folders. So if you delete biopython messages but don't compact the mail folders, you probably still have those messages even if they don't show up in Netscape. --Michiel From mdehoon at ims.u-tokyo.ac.jp Thu Mar 10 01:56:32 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Thu Mar 10 01:51:21 2005 Subject: [BioPython] Removing spam from the biopython-dev archives In-Reply-To: <422E812E.1010903@ims.u-tokyo.ac.jp> References: <422C38C0.6060300@ims.u-tokyo.ac.jp> <422E812E.1010903@ims.u-tokyo.ac.jp> Message-ID: <422FEFA0.60500@ims.u-tokyo.ac.jp> I have created a new source file for the biopython mailing list archive from the contributions of various people. As far as I can tell, the archive is complete now. I have asked the biopython.org mailing list administrator to upload and install the new archive, which by the way is spam-free. Thanks everybody! --Michiel. Michiel Jan Laurens de Hoon wrote: >> The bad news is that when the biopython-dev archive was replaced on >> the biopython.org webserver, the source file for the biopython archive >> was accidentally overwritten. The biopython archive is still there, >> but the source file from which the archives are generated is not. >> There is a backup file until about March 2004, but for messages >> between March 2004 and early March 2005 we no longer have the source >> file. If you happen to have a copy of these messages, please let us know. > > > Bartek Wilczynski sent me his copy of the Biopython messages since mid > August 2004, which allowed me to recover all messages since then except > for one message in September. From March 2004 to mid August 2004, there > is still a gap. So if you have saved some of those messages, please let > me know. > Note that Netscape and Mozilla don't actually delete a message from the > source file until you compact the mail folders. So if you delete > biopython messages but don't compact the mail folders, you probably > still have those messages even if they don't show up in Netscape. > > --Michiel > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > From jtk at cmp.uea.ac.uk Thu Mar 10 07:20:00 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Thu Mar 10 06:27:23 2005 Subject: [BioPython] Patch: Minor GenBank Parsing Problems In-Reply-To: <20050307170245.GE32158@jtkpc.cmp.uea.ac.uk> References: <20050307170245.GE32158@jtkpc.cmp.uea.ac.uk> Message-ID: <20050310122000.GA25350@jtkpc.cmp.uea.ac.uk> Dear Biopython Maintainers, I've patched Bio/expressions/genbank.py to solve the parsing problems I recently reported. The patch is marginal, so I just attach it to this message; I hope that's ok. The test doesn't introduce new failures into the regression tests (I currently get a failure on Restriction, which is somewhat strange, but at any rate, unrelated to the subject here). I haven't added any tests myself. On Mon, Mar 07, 2005 at 05:02:45PM +0000, Jan T. Kim wrote: > I have noticed the following problems with the Bio.GenBank.FeatureParser: > > * The parser appears to depend on additional information in the > LOCUS line, it works with > > LOCUS U00096 4639675 bp DNA circular BCT 24-JUN-2004 > while the undecorated line > > LOCUS U00096 > > results in a Martel.Parser.ParserPositionException. I solved this by making everything following the locus value optional. It appears that Biopython doesn't need the additional information, at least for me, this fixes the problem without introducing any new ones. Note: You probably won't get records with the additional info missing from NCBI, but if you use EMBOSS to write GenBank formatted files, the additional info is lost, and having to make that up manually just to satisfy BioPython is a bit annoying. > * The parser also doesn't like some accession types, the line > > ACCESSION U00096 AE000111-AE000510 > > while trimming that to > > ACCESSION U00096 > > results in a file that now parses ok. This was solved by allowing a "-" in accession numbers. While this fixes the problem for now, this may not be entirely ideal as the accession number "AE000111-AE000510" may be invalid, it seems to denote a range of accession numbers AE000111, AE000112, ..., AE000510, but I'm not certain about this. Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk@cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* -------------- next part -------------- diff -Naur biopython-1.40b/Bio/expressions/genbank.py biopython-1.40b-hacked/Bio/expressions/genbank.py --- biopython-1.40b/Bio/expressions/genbank.py 2004-03-18 00:53:24.000000000 +0000 +++ biopython-1.40b-hacked/Bio/expressions/genbank.py 2005-03-10 11:34:37.134325592 +0000 @@ -116,20 +116,22 @@ data_file_division = Martel.Group("data_file_division", Martel.Alt(*divisions)) +# JTK: made everything followgin "LOCUS XXXXXX" optional -- seqret of EMBOSS +# doesn't supply all this base count, division, linear/circular etc. stuff locus_line = Martel.Group("locus_line", Martel.Str("LOCUS") + blank_space + locus + - blank_space + - size + - blank_space + - Martel.Re("bp|aa") + - blank_space + - Martel.Opt(residue_type + - blank_space) + - data_file_division + - blank_space + - date + + Martel.Opt(blank_space + + size + + blank_space + + Martel.Re("bp|aa") + + blank_space + + Martel.Opt(residue_type + + blank_space) + + data_file_division + + blank_space + + date) + Martel.AnyEol()) # definition line @@ -141,8 +143,11 @@ # accession line # ACCESSION AC007323 +# JTK: allowed also a "-" in accession, to allow for +# ACCESSION U00096 AE000111-AE000510 +# as found in E. coli K12 GenBank record accession = Martel.Group("accession", - Martel.Re("[\w]+")) + Martel.Re("[\w\-]+")) accession_block = Martel.Group("accession_block", Martel.Str("ACCESSION") + From idoerg at burnham.org Thu Mar 10 12:40:44 2005 From: idoerg at burnham.org (Iddo Friedberg) Date: Thu Mar 10 12:35:20 2005 Subject: [BioPython] Call for Papers -- Science Track at EuroPython 2005 Message-ID: <4230869C.5050100@burnham.org> Dear former participants, Dear list subscribers, EuroPython 2005 will be held in Goteborg, Sweden from June 27 to July 1. For the fourth consecutive year, EuroPython will host a Science Track dedicated to talks about the use of Python in any kind of scientific project. To quote the website : """The Science track will focus on the use of Python in science and industry, where tasks imply modelling complex systems (thermics, fluid dynamics, mechanics, aeronautics, biology, chemistry, etc.), processing very large data sets and achieving very CPU-intensive and long calculations. Speakers will present tool sets, frameworks and examples of successful applications based on Python and integrated with the other usual tools and applications used in the field. """ For archives of previous years, please see the website at http://www.europython.org/ If you would like to submit a talk proposal, please write to me directly as another two weeks will be needed before the website gets upgraded and updated for this year's conference. The deadline for submitting proposals is expected to be set to mid-april. -- Nicolas Chauvat logilab.fr - services en informatique avanc?e et gestion de connaissances -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 http://ffas.ljcrf.edu/~iddo ========================== The First Automated Protein Function Prediction SIG Detroit, MI June 24, 2005 http://ffas.burnham.org/AFP From krebsj at cip.ifi.lmu.de Fri Mar 11 09:45:09 2005 From: krebsj at cip.ifi.lmu.de (Joerg Krebs) Date: Fri Mar 11 09:40:00 2005 Subject: [BioPython] Parsing Swissprot-Files Message-ID: Hi, does anyone know how to handle Swissprot Files containing RX and RG entries in the annotation Part? I get error messages when I try to read some Swissprot files, looking like this: ---------------- Traceback (most recent call last): File "SwPr2Fasta.py", line 19, in ? record = sp.next() File "/usr/lib/python2.3/site-packages/Bio/SwissProt/SProt.py", line 166, in n ext return self._parser.parse(File.StringHandle(data)) File "/usr/lib/python2.3/site-packages/Bio/SwissProt/SProt.py", line 290, in p arse self._scanner.feed(handle, self._consumer) File "/usr/lib/python2.3/site-packages/Bio/SwissProt/SProt.py", line 333, in f eed self._scan_record(uhandle, consumer) File "/usr/lib/python2.3/site-packages/Bio/SwissProt/SProt.py", line 338, in _ scan_record fn(self, uhandle, consumer) File "/usr/lib/python2.3/site-packages/Bio/SwissProt/SProt.py", line 414, in _ scan_reference self._scan_ra(uhandle, consumer) File "/usr/lib/python2.3/site-packages/Bio/SwissProt/SProt.py", line 436, in _ scan_ra one_or_more=1) File "/usr/lib/python2.3/site-packages/Bio/SwissProt/SProt.py", line 360, in _ scan_line read_and_call(uhandle, event_fn, start=line_type) File "/usr/lib/python2.3/site-packages/Bio/ParserSupport.py", line 300, in rea d_and_call raise SyntaxError, errmsg SyntaxError: Line does not start with 'RA': RG The German cDNA consortium; -------- It would be great of someone could help me with my problem. Thanks Joerg From eirik.sonneland at student.umb.no Mon Mar 14 07:51:32 2005 From: eirik.sonneland at student.umb.no (=?ISO-8859-1?Q?Eirik_S=F8nneland?=) Date: Mon Mar 14 07:46:02 2005 Subject: [BioPython] Re: [Biopython-dev] [Bug 1761] New: BLAST not returning final results In-Reply-To: <200503141204.j2EC4ANE021948@portal.open-bio.org> References: <200503141204.j2EC4ANE021948@portal.open-bio.org> Message-ID: <423588D4.3080504@student.umb.no> Kate, I belive it should be NCBIWWW.qblast(.......... regards Eirik bugzilla-daemon@portal.open-bio.org wrote: >http://bugzilla.open-bio.org/show_bug.cgi?id=1761 > > Summary: BLAST not returning final results > Product: Biopython > Version: Not Applicable > Platform: PC > OS/Version: Windows XP > Status: NEW > Severity: normal > Priority: P2 > Component: Other > AssignedTo: biopython-dev@biopython.org > ReportedBy: genesniffer@hotmail.com > > >Hi, >I'm just trying to run BLAST at NCBI using NCBIWWW using the script below. >The request seems to be sent fine, but the returned page ends with "This page >will automatically be updated...." ie the final results are not being >returned. The module doesn't not seem to be polling for the final result. Is >this a bug in the program or am I doing something wrong? I look forward to >hearing from you... >Kate > >from Bio.Blast import NCBIWWW >blast_results = NCBIWWW.blast('blastp', 'nr', protein_seq, entrez_query="Homo >sapiens [ORGN]") >blast_results = blast_results.readlines() >print blast_results > > > >------- You are receiving this mail because: ------- >You are the assignee for the bug, or are watching the assignee. >_______________________________________________ >Biopython-dev mailing list >Biopython-dev@biopython.org >http://biopython.org/mailman/listinfo/biopython-dev > > From Stephan.Herschel at biovertis.com Tue Mar 15 12:23:10 2005 From: Stephan.Herschel at biovertis.com (Stephan Herschel) Date: Tue Mar 15 12:41:43 2005 Subject: [BioPython] GenBank Parser and large files? Message-ID: Hi, I'm parsing some genebank files. When parsing large files it appears the parser eats up all of my memory until nothing is left, i.e. it's not possible to parse large files (>25 MB). That's the way I do it: >>> from Bio import GenBank >>> fh=open(fname,'r') >>> feature_parser=GenBank.FeatureParser() >>> gb_iterator = GenBank.Iterator(fh, feature_parser) >>> cur_record = gb_iterator.next() Is there a way to circumvent this problem? Thanks, Stephan From biopython at maubp.freeserve.co.uk Tue Mar 15 16:27:08 2005 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue Mar 15 16:16:01 2005 Subject: [BioPython] GenBank Parser and large files? In-Reply-To: References: Message-ID: <4237532C.2080809@maubp.freeserve.co.uk> Stephan Herschel wrote: > Hi, > I'm parsing some genebank files. When parsing large files it > appears the parser eats up all of my memory until nothing is > left, i.e. it's not possible to parse large files (>25 MB). You are not the only one to find this... > That's the way I do it: > >>> from Bio import GenBank > >>> fh=open(fname,'r') > >>> feature_parser=GenBank.FeatureParser() > >>> gb_iterator = GenBank.Iterator(fh, feature_parser) > >>> cur_record = gb_iterator.next() The code looks fine. > Is there a way to circumvent this problem? > Thanks, > Stephan Yes - please try out my patch available on bug 1747, http://bugzilla.open-bio.org/show_bug.cgi?id=1747 If you don't know how to use the diff file, tell me which version of BioPython you are using (1.30 or 1.40b I would assume) and I can email you a replacement python file instead: Bio/GenBank/__init__.py As a short term measure, depending on which bits of the GenBank file you care about, you can try editing the GenBank file by hand before parsing to remove most of the features (leave at least one), or most of the sequence. Peter From biopython at maubp.freeserve.co.uk Wed Mar 16 07:27:02 2005 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed Mar 16 07:25:12 2005 Subject: [BioPython] GenBank Parser and large files? In-Reply-To: <29408998.1110965792461.JavaMail.root@bvmail> References: <29408998.1110965792461.JavaMail.root@bvmail> Message-ID: <42382616.6070609@maubp.freeserve.co.uk> Stephan Herschel wrote: > Hi Peter, > > Thanks for the the fast solution! (I guessed that I could not be the > only one having this problem ...) > > Apparently I applied the patch successfully - only the 'Reversed > patch'- message was confusing - as you can see here: Yeah - that was the first time I ever created a patch file, and in the absence of any instructions on the BioPython developers list, I just improvised: diff my_version.py vcs_version.py > patch.txt Would the reversed patch messages would go away if I had done this? diff vcs_version.py my_version.py > patch.txt > (Stripping trailing CRs from patch.) Those are because I did this on Windows. > can't find file to patch at input line 1 > > Perhaps you should have used the -p or --strip option? The --strip option doesn't exist on my version of diff, and -p is an alias for --show-c-function (Show which C function each change is in) which doesn't seem relevant:- diff (GNU diffutils) 2.8.7 (The cygwin version running on Windows) > Tested the parser - works just fine! Great. Please let me know (or post on the bug) if you have any trouble with the patch - or even if it works fine for you. http://bugzilla.open-bio.org/show_bug.cgi?id=1747 What sort of GenBank files are you using, and where are they from? I have mainly been using bacteria from the NCBI - watch out for bug 1758 and bug 1762 as my patch will probably also fail on those. Peter -- PhD Student MOAC Doctoral Training Centre University of Warwick, UK From jtk at cmp.uea.ac.uk Wed Mar 16 09:26:13 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Wed Mar 16 08:24:30 2005 Subject: [BioPython] Patches (was: GenBank Parser and large files?) In-Reply-To: <42382616.6070609@maubp.freeserve.co.uk> References: <29408998.1110965792461.JavaMail.root@bvmail> <42382616.6070609@maubp.freeserve.co.uk> Message-ID: <20050316142613.GA1033@jtkpc.cmp.uea.ac.uk> On Wed, Mar 16, 2005 at 12:27:02PM +0000, Peter wrote: > Stephan Herschel wrote: > >Hi Peter, > > > >Thanks for the the fast solution! (I guessed that I could not be the > >only one having this problem ...) > > > >Apparently I applied the patch successfully - only the 'Reversed > >patch'- message was confusing - as you can see here: > > Yeah - that was the first time I ever created a patch file, and in the > absence of any instructions on the BioPython developers list, I just > improvised: > > diff my_version.py vcs_version.py > patch.txt > > Would the reversed patch messages would go away if I had done this? > > diff vcs_version.py my_version.py > patch.txt Yes. The direction of patches is old -> new, by convention. If the patch program can tell whether the order was reversed, and it prints the 'Reversed patch' message. With a plain diff, as you sent it, patch can reliably tell whether the file being patched is the "old" or the "new" version, so there should not be any problem. > >(Stripping trailing CRs from patch.) > > Those are because I did this on Windows. Most Windows editors can switch the extraneous CRs on or off these days. But again, this should not cause any serious problem. > >can't find file to patch at input line 1 > > > >Perhaps you should have used the -p or --strip option? > > The --strip option doesn't exist on my version of diff, and -p is an > alias for --show-c-function (Show which C function each change is in) > which doesn't seem relevant:- These options pertain to patch: -pnum or --strip=num: Strip the smallest prefix containing num leading slashes from each file name found in the patch file. Plain diffs tend to trigger some informational messages or requests for interactive input (request for the file to be patched etc.) when fed into patch, because this information is not contained in a plain diff. Unified context diffs contain this information, as seen in the (very minor) patch I recently submitted: diff -Naur biopython-1.40b/Bio/expressions/genbank.py biopython-1.40b-hacked/Bio/expressions/genbank.py --- biopython-1.40b/Bio/expressions/genbank.py 2004-03-18 00:53:24.000000000 +0000 +++ biopython-1.40b-hacked/Bio/expressions/genbank.py 2005-03-10 11:34:37.134325592 +0000 > diff (GNU diffutils) 2.8.7 > (The cygwin version running on Windows) This should accept the options -Naur (-N: treat new files as empty in other directory, -a: treat lines as text, -u: write unified format, -r: recursive). This allows changes that have affected multiple files to be rolled into one patch (which wasn't the case here), but once you pick up this habit, you'll use it by default (like I did). > >Tested the parser - works just fine! > > Great. I'll give it a try as soon as I get around to do so too -- in fact, I must confess that I've started using ad hoc hacks to work with eukaryotic pseudochromosome files since an attempt to parse one with Biopython didn't terminate in more than 24 hours... Best regards, Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk@cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* From karbak at gmail.com Wed Mar 23 23:51:37 2005 From: karbak at gmail.com (K. Arun) Date: Wed Mar 23 23:46:11 2005 Subject: [BioPython] AAIndex parser ? Message-ID: <162452a1050323205165ab1617@mail.gmail.com> Hello, Back in April 2004, there were a few messages about an AAIndex [1] parser that was to be committed to the Biopython tree. I can't seem to find the module/file in the current release. Does anyone know if the code is available somewhere ? Thanks, -arun [1] http://www.genome.jp/aaindex/ From kal03 at hampshire.edu Sat Mar 26 11:50:30 2005 From: kal03 at hampshire.edu (Kari Linder) Date: Sat Mar 26 10:39:46 2005 Subject: [BioPython] Interacting with GenBank Message-ID: <2dad2c1ed63c377bcdaad2ed5688d57e@hampshire.edu> Here is the script I am working with : from Bio.WWW import NCBI search_command = 'Search' search_database = 'Nucleotide' return_format = 'Fasta' search_term = 'Ciliphora' result_handle = NCBI.query(search_command, search_database, term = search_term, doptcmdl = return_format) import os result_file_name = os.path.join(os.getcwd(), 'results.html') result_file = open(result_file_name, 'w') result_file.write(result_handle.read()) result_file.close() It is basically from section "2.5 Connecting with biological databases" in the biopython tutorial, and it works being that it gives me an html file, however when I open the file it says that the results cannot be displayed. Has anyone had this problem? Thanks Kari From jtk at cmp.uea.ac.uk Sat Mar 26 15:52:32 2005 From: jtk at cmp.uea.ac.uk (Jan T. Kim) Date: Sat Mar 26 14:50:53 2005 Subject: [BioPython] Interacting with GenBank In-Reply-To: <2dad2c1ed63c377bcdaad2ed5688d57e@hampshire.edu> References: <2dad2c1ed63c377bcdaad2ed5688d57e@hampshire.edu> Message-ID: <20050326205232.GB2271@jtkpc.cmp.uea.ac.uk> On Sat, Mar 26, 2005 at 10:50:30AM -0600, Kari Linder wrote: > Here is the script I am working with : > > from Bio.WWW import NCBI > > > search_command = 'Search' > search_database = 'Nucleotide' > return_format = 'Fasta' > search_term = 'Ciliphora' > > > result_handle = NCBI.query(search_command, search_database, term = > search_term, doptcmdl = return_format) > > import os > > result_file_name = os.path.join(os.getcwd(), 'results.html') > result_file = open(result_file_name, 'w') > result_file.write(result_handle.read()) > result_file.close() > > It is basically from section "2.5 Connecting with biological > databases" in the biopython tutorial, and it works being that it gives > me an html file, however when I open the file it says that the results > cannot be displayed. Has anyone had this problem? Not me... Can you be more specific about which is "it" which "says that the results cannot be displayed"? If "it" is your operating system, then there's little that Biopython or Genbank can do about that, if, however, "it" is a Biopython routine, then perhaps some HTML has changed such that the parser cannot understand it anymore or something like that. Please provide a transcript of the error. Best regards (and happy Easter), Jan -- +- Jan T. Kim -------------------------------------------------------+ | *NEW* email: jtk@cmp.uea.ac.uk | | *NEW* WWW: http://www.cmp.uea.ac.uk/people/jtk | *-----=< hierarchical systems are for files, not for humans >=-----* From syd.diamond at gmail.com Wed Mar 30 17:07:34 2005 From: syd.diamond at gmail.com (Syd Diamond) Date: Wed Mar 30 17:01:42 2005 Subject: [BioPython] native python sequence alignment Message-ID: Hi yall, I need to do pairwise sequence alignment of short alphanumeric strings, and I turned to biopython to look for this tool. I was hoping to do it natively and not need to repeatedly use a wrapper to ClustalW or something like that. In scriptcentral, I found: http://starship.python.net/crew/gherman/potpurri/align/ 60% of the time, it works every time. However, it's not quite perfect, and before I went diving into the heart of the alignment function, I wanted to see if anyone knew of a better approach or alternate code. I've googled extensively without luck. Many thanks, Joel From fkauff at duke.edu Wed Mar 30 17:14:54 2005 From: fkauff at duke.edu (Frank Kauff) Date: Wed Mar 30 17:09:12 2005 Subject: [BioPython] native python sequence alignment In-Reply-To: References: Message-ID: <1112220894.5142.76.camel@osiris.biology.duke.edu> Hi, there's the pairwise2 module in biopython which is pretty straightforward and does a nice job. Not excitingly fast, but if you have short seqs anyway, it shouldn't matter. Frank On Wed, 2005-03-30 at 17:07 -0500, Syd Diamond wrote: > Hi yall, > > I need to do pairwise sequence alignment of short alphanumeric strings, > and I turned to biopython to look for this tool. I was hoping to do it > natively and not need to repeatedly use a wrapper to ClustalW or something > like that. In scriptcentral, I found: > > http://starship.python.net/crew/gherman/potpurri/align/ > 60% of the time, it works every time. However, it's not quite perfect, > and before I went diving into the heart of the alignment function, I > wanted to see if anyone knew of a better approach or alternate code. I've > googled extensively without luck. > > Many thanks, > Joel > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 Web http://www.lutzonilab.net/member/frankkauff.shtml From pap501 at york.ac.uk Thu Mar 31 05:21:34 2005 From: pap501 at york.ac.uk (pap501@york.ac.uk) Date: Thu Mar 31 05:18:40 2005 Subject: [BioPython] Problem with python and mysql Message-ID: Hi I am a masters student at the University of York working on a project to create a database of DNA sequences. I am trying to grab information from text files using a python script to insert data into a MySQL table. The problem is that although I can query MySQL through Python I cannot write into MySQL through Python. I am using the Windows platform with Python version 2.4 with Biopython installed and MySQL Server 4.1. I have attached my python script to this email. The script takes each text file in turn (although only one is listed in the script at the mo) and inserts the library code, Genbank code (primary key), TiGR code (if there is one, null otherwise) and the DNA sequence (string of characters). Each file is a library of ~2000 DNA sequences. The python script runs but does not write the information into my MySQL table. When I query the table in MySQL it says that the table is empty. Can anyone advise me what to do? Is it soemthing to do with the setup of MySQL and/or Python or a problem with the Python script? Any advice would be greatly appreciated. Many thanks Phil -------------- next part -------------- import cStringIO import os import MySQLdb conn= MySQLdb.connect (host="localhost", user ="root", passwd ="p1geon1", db="phil") allfilenames=('F50',) for a in allfilenames: print a input8=open ('SsGI_library_#'+a+'.txt','r') outfile=open('outtest.txt', 'w') i8=input8.read() # print i8 listofentries=i8.split('>') for li in listofentries: print li numbers=li.split('\n')[0] print numbers seq=li.split('\n')[1] # print numbers bothnumbers=numbers.split('\t') print bothnumbers gb=bothnumbers[0] if len(bothnumbers)==2: tnum= bothnumbers [1] else: tnum='NULL' qry="insert into sequences2 values('%s','%s','%s','%s');"% (a, gb, tnum, seq) print qry conn.query(qry) From l.heisler at utoronto.ca Thu Mar 31 11:00:00 2005 From: l.heisler at utoronto.ca (Larry) Date: Thu Mar 31 10:52:11 2005 Subject: [BioPython] Problem with python and mysql References: Message-ID: <00a501c5360a$b3ee7810$1651968e@Larry> Hi Phil, I don't believe .query can be used with a MySQLdb connection. I am not an expert on this, but the way I have done this in the past is 1. set up the connection conn= MySQLdb.connect (host="localhost", user ="root", passwd ="p1geon1", db="phil") 2. set up a cursor mycursor=conn.cursor() 3. create your query qry="insert into sequences2 values('%s','%s','%s','%s');"% (a, gb, tnum, seq) 4. execute your query mycursor.execute(qry) others may be able to give you more details on this, ie. using .query which i believe works with _mysql ie use _mysql conn= _mysql.connect (host="localhost", user ="root", passwd ="p1geon1",db="phil") conn.query(qry) Larry ----- Original Message ----- From: To: Sent: Thursday, March 31, 2005 5:21 AM Subject: [BioPython] Problem with python and mysql > Hi > > I am a masters student at the University of York working on a project to > create a database of DNA sequences. > > I am trying to grab information from text files using a python script to > insert data into a MySQL table. The problem is that although I can query > MySQL through Python I cannot write into MySQL through Python. I am using > the Windows platform with Python version 2.4 with Biopython installed and > MySQL Server 4.1. > > I have attached my python script to this email. > > The script takes each text file in turn (although only one is listed in the > script at the mo) and inserts the library code, Genbank code (primary key), > TiGR code (if there is one, null otherwise) and the DNA sequence (string of > characters). Each file is a library of ~2000 DNA sequences. The python > script runs but does not write the information into my MySQL table. When I > query the table in MySQL it says that the table is empty. > > Can anyone advise me what to do? Is it soemthing to do with the setup of > MySQL and/or Python or a problem with the Python script? > > Any advice would be greatly appreciated. > > Many thanks > > Phil ---------------------------------------------------------------------------- ---- > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > From pap501 at york.ac.uk Thu Mar 31 05:30:30 2005 From: pap501 at york.ac.uk (pap501@york.ac.uk) Date: Tue Apr 5 22:06:18 2005 Subject: [BioPython] mysql trouble Message-ID: Hi I am a masters student at the University of York working on a project to create a database of DNA sequences. I am trying to grab information from text files using a python script to insert data into a MySQL table. The problem is that although I can query MySQL through Python I cannot write into MySQL through Python. I am using the Windows platform with Python version 2.4 with Biopython installed and MySQL Server 4.1. I have attached my python script to this email. The script takes each text file in turn (although only one is listed in the script at the mo) and inserts the library code, Genbank code (primary key), TiGR code (if there is one, null otherwise) and the DNA sequence (string of characters). Each file is a library of ~2000 DNA sequences. The python script runs but does not write the information into my MySQL table. When I query the table in MySQL it says that the table is empty. Can anyone advise me what to do? Is it soemthing to do with the setup of MySQL and/or Python or a problem with the Python script? Any advice would be greatly appreciated. Many thanks Phil -------------- next part -------------- import cStringIO import os import MySQLdb conn= MySQLdb.connect (host="localhost", user ="root", passwd ="p1geon1", db="phil") allfilenames=('F50',) for a in allfilenames: print a input8=open ('SsGI_library_#'+a+'.txt','r') outfile=open('outtest.txt', 'w') i8=input8.read() # print i8 listofentries=i8.split('>') for li in listofentries: print li numbers=li.split('\n')[0] print numbers seq=li.split('\n')[1] # print numbers bothnumbers=numbers.split('\t') print bothnumbers gb=bothnumbers[0] if len(bothnumbers)==2: tnum= bothnumbers [1] else: tnum='NULL' qry="insert into sequences2 values('%s','%s','%s','%s');"% (a, gb, tnum, seq) print qry conn.query(qry)