From bugzilla-daemon at portal.open-bio.org Thu Apr 1 18:23:04 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] [Bug 1613] New: pubmed example doesn't work. corrected example included Message-ID: <200404012323.i31NN4Jg017348@portal.open-bio.org> http://bugzilla.bioperl.org/show_bug.cgi?id=1613 Summary: pubmed example doesn't work. corrected example included Product: Biopython Version: Not Applicable Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: biopython-dev@biopython.org ReportedBy: cariaso@yahoo.com While the bug is small, its enough to scare away new users.. in the cookbook, section 3.3.1 The example is from Bio.Medline import PubMed search_term = 'orchid' orchid_ids = PubMed.search_for(search_term) but it doesn't work (for me) that way. however this does work. from Bio import PubMed search_term = 'orchid' orchid_ids = PubMed.search_for(search_term) just delete the '.Medline' ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at uga.edu Fri Apr 2 11:07:52 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] Martel NCBI blastn format In-Reply-To: <1080751311.406af4cfe2d54@webmail.ipk-gatersleben.de> References: <1080751311.406af4cfe2d54@webmail.ipk-gatersleben.de> Message-ID: <20040402160752.GA45713@evostick.agtec.uga.edu> Hi Heiko; > The parser expression for blastn defined in Bio.expressions.blast.ncbiblast.py > is broken for version BLASTN 2.2.8 [Jan-05-2004] (and even for the older > BLASTN 2.2.6 [Apr-09-2003]). > In the output is an additional 'hsp_info' section, which was not defined in the > blastn expression. This could be patched in one line. Thanks. Patch applied to CVS. I appreciate you looking at the Martel formats for blast -- these are not heavily integrated into Biopython yet (the Bio.Blast parsers do not use them) so they don't get as much checking as they need. If you find any other problems or have any suggestions, please do let us know. Thanks again. Brad From idoerg at burnham.org Fri Apr 2 13:32:05 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] aaindex1 Message-ID: <406DB1A5.3000304@burnham.org> Hi all, I wrote a module for parsing the aaindex1 database. From the aaindex1 README file: 'An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biological properties of amino acids. The AAindex1 section of the Amino Acid Index Database is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices. This section currently contains 494 indices.' See http://www.genome.ad.jp/dbget/aaindex.html for details. aaindex1 file may be downloaded from ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex/aaindex1 This module contains the following classes: AAIndex1Record: holds the information from a single aaindex1 entry AAIndex1Reader: parses the aaindex1 file and one ustility which reads the aaindex1 file and returns a dictionary of entries. Should I open a new directory under Bio/AAIndex1, or is there a more appropriate place for this? ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From iliketobicycle at yahoo.ca Sat Apr 3 15:09:45 2004 From: iliketobicycle at yahoo.ca (Harry Zuzan) Date: Sat Mar 5 14:43:31 2005 Subject: [Biopython-dev] OligoPython for Affymetrix data Message-ID: <20040403200945.66503.qmail@web21410.mail.yahoo.com> Hello, A short while back I asked if there was an interest in code for handling Affymetrix data. The answer was yes, so I put together a simple module for reading in an Affymetrix cel file. The data parsed from the cel file are then available in the form of Numeric arrays. There is a complementary module that encapsulates these data and makes them much more useful but it is more complex so I thought I'd find my BioPython legs with this simple module. I need to work on an install script, better documentation, better error handling and an alpha version number. I also need to make Makefile for the C++ module friendly to more users. The code is at www.oligopython.org. I'm calling it OligoPython until it is in the BioPython cvs tree. The OligoPython license is the BioPython license verbatim except for the heading. Since I am unfamiliar with cvs and writing Python install scripts and writing Makefiles that compile under many versions of unix, I would appreciate any help integrating this module with BioPython so that I can add another important module soon. I'm hoping that there will be interest in this so any feedback is appreciated. Best, Harry From idoerg at burnham.org Sat Apr 3 21:57:16 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] OligoPython for Affymetrix data In-Reply-To: <20040403200945.66503.qmail@web21410.mail.yahoo.com> References: <20040403200945.66503.qmail@web21410.mail.yahoo.com> Message-ID: <406F798C.5040408@burnham.org> Hi Harry, Thanks for your effort. This sounds like a welcome contribution. I am getting "connection refused" when I am trying get the URL you listed. Can you check that out please? Thanks, Iddo Harry Zuzan wrote: > Hello, > > A short while back I asked if there was an interest in code for > handling Affymetrix data. The answer was yes, so I put together a > simple module for reading in an Affymetrix cel file. The data parsed > from the cel file are then available in the form of Numeric arrays. > > There is a complementary module that encapsulates these data and makes > them much more useful but it is more complex so I thought I'd find my > BioPython legs with this simple module. I need to work on an install > script, better documentation, better error handling and an alpha > version number. I also need to make Makefile for the C++ module > friendly to more users. > > The code is at www.oligopython.org. I'm calling it OligoPython until > it is in the BioPython cvs tree. The OligoPython license is the > BioPython license verbatim except for the heading. > > Since I am unfamiliar with cvs and writing Python install scripts and > writing Makefiles that compile under many versions of unix, I would > appreciate any help integrating this module with BioPython so that I > can add another important module soon. > > I'm hoping that there will be interest in this so any feedback is > appreciated. > > Best, > > Harry > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From bugzilla-daemon at portal.open-bio.org Sun Apr 4 10:13:09 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] [Bug 1605] kMeans.py should be deprecated Message-ID: <200404041413.i34ED9e7016155@portal.open-bio.org> http://bugzilla.bioperl.org/show_bug.cgi?id=1605 mdehoon@ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Additional Comments From mdehoon@ims.u-tokyo.ac.jp 2004-04-04 10:13 ------- I added a DeprecationWarning to kMeans.py and xkMeans.py (which relies on kMeans.py). ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From iliketobicycle at yahoo.ca Mon Apr 5 11:04:31 2004 From: iliketobicycle at yahoo.ca (Harry Zuzan) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Oligopython page Message-ID: <20040405150431.67425.qmail@web21408.mail.yahoo.com> Hi, I can't get dyndns to update www.oligopython.org. For the time being, anyone interested can get the same pages from oligopython.dyndns.org. Sorry about any confusion. Also, there is not a lot of code at this point so I just attached it. Harry -------------- next part -------------- A non-text attachment was scrubbed... Name: OligoPython.tgz Type: application/x-gzip-compressed Size: 3325 bytes Desc: OligoPython.tgz Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20040405/b13f5a60/OligoPython.bin From mdehoon at ims.u-tokyo.ac.jp Tue Apr 6 01:10:08 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Oligopython page In-Reply-To: <20040405150431.67425.qmail@web21408.mail.yahoo.com> References: <20040405150431.67425.qmail@web21408.mail.yahoo.com> Message-ID: <40723BB0.5040809@ims.u-tokyo.ac.jp> Hi, Thanks for writing oligopython! I had a look at your package to see if I could write a setup.py for it, and I noticed that the file parser makes use of C++ rather than C. If I'm not mistaken, the only C++ code currently in Biopython is Bio.KDTree, which is not installed by default because of problems building it on some platforms. Is there some Biopython policy on C++ code? It may also be possible to use Martel for the file parsing. I am not familiar with Martel myself, so I cannot give much guidance there, but maybe somebody can. --Michiel. Harry Zuzan wrote: > Hi, > > I can't get dyndns to update www.oligopython.org. > > For the time being, anyone interested can get the same pages from > oligopython.dyndns.org. > > Sorry about any confusion. > > Also, there is not a lot of code at this point so I just attached it. > > Harry > > > ------------------------------------------------------------------------ > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From iliketobicycle at yahoo.ca Tue Apr 6 09:26:24 2004 From: iliketobicycle at yahoo.ca (Harry Zuzan) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Oligopython page In-Reply-To: <40723BB0.5040809@ims.u-tokyo.ac.jp> Message-ID: <20040406132624.52327.qmail@web21407.mail.yahoo.com> Hi, Originally the parser was in Python but it was too slow. I could have written it in C instead of C++ but I have a lot of other code related to this that is written in C++. If I write a class in Python that is too slow because of the amount of data it has to handle I often solve the problem by converting the Python class to a C++ class. Then I write a thin wrapper in Python around the C++ class. So in my Python scripts nothing changes. This is where the C++ code comes from. I'll read up on install scripts myself. In the meantime I'm grateful for any help. Harry > Hi, > > Thanks for writing oligopython! I had a look at your package to see > if I could > write a setup.py for it, and I noticed that the file parser makes use > of C++ > rather than C. If I'm not mistaken, the only C++ code currently in > Biopython is > Bio.KDTree, which is not installed by default because of problems > building it on > some platforms. Is there some Biopython policy on C++ code? It may > also be > possible to use Martel for the file parsing. I am not familiar with > Martel > myself, so I cannot give much guidance there, but maybe somebody can. > > --Michiel. > > Harry Zuzan wrote: > > > Hi, > > > > I can't get dyndns to update www.oligopython.org. > > > > For the time being, anyone interested can get the same pages from > > oligopython.dyndns.org. > > > > Sorry about any confusion. > > > > Also, there is not a lot of code at this point so I just attached > it. > > > > Harry > > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev@biopython.org > > http://biopython.org/mailman/listinfo/biopython-dev > > -- > Michiel de Hoon, Assistant Professor > University of Tokyo, Institute of Medical Science > Human Genome Center > 4-6-1 Shirokane-dai, Minato-ku > Tokyo 108-8639 > Japan > http://bonsai.ims.u-tokyo.ac.jp/~mdehoon > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From dalke at dalkescientific.com Tue Apr 6 16:11:42 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] EUtils Message-ID: <9F378DC6-8806-11D8-B94E-000393C92466@dalkescientific.com> I was at the PyCon conference a couple weeks ago and talked with Mark Johnson from NCBI about several things, including the EUtils client package in Biopython. That reminded me that I needed to clean up the package and update it to support the latest NCBI interface (they added a couple new features in the last two years). I've got some free time this week so I'm working on it. The changes are: - don't require the response the DTD. I now believe that DTDs for this task are the wrong approach. I'm using elementtree. - get rid of the silly Problem class hierarchy I was using - simplify the regression tests so it's easier to read - add more tests (found some bugs in a few corners already) - figure out a better way to do live tests against the server - include support for a request throttle - pull the documentation out of the docstrings and into an actual document - resubmit my bug reports in the hopes that they'll fix the server - rearrange the source tree For the last, what I want to do is move the documentation and tests out of Bio/EUtils/ and into the proper places for Biopython. As stands EUtils is distributable independent of Biopython. I don't think that's worthwhile so I'm thinking of no longer doing that. If I do make it independently distributable then I'll have some little script to assemble the bits and pieces from the Biopython tree. Is anyone here using the EUtils client for anything and has code that I can use? I want to examine common use cases and see if there's a way to simplify the interface. (Eg, have some top-level functions for doing retrieval rather than making a client object.) I also want to make sure I don't change the API to break existing code. Andrew dalke@dalkescientific.com From chapmanb at uga.edu Tue Apr 6 18:00:24 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Oligopython page In-Reply-To: <40723BB0.5040809@ims.u-tokyo.ac.jp> References: <20040405150431.67425.qmail@web21408.mail.yahoo.com> <40723BB0.5040809@ims.u-tokyo.ac.jp> Message-ID: <20040406220024.GB25784@evostick.agtec.uga.edu> Hey Harry and Michiel; [Harry announces OligoPython] > >For the time being, anyone interested can get the same pages from > >oligopython.dyndns.org. Thanks for this -- very nice to see people working on microarray code and we don't have anything like this in Biopython so it is very welcome. Michiel: > Thanks for writing oligopython! I had a look at your package to see if I > could write a setup.py for it, and I noticed that the file parser makes use > of C++ rather than C. If I'm not mistaken, the only C++ code currently in > Biopython is Bio.KDTree, which is not installed by default because of > problems building it on some platforms. Is there some Biopython policy on > C++ code? If I remember properly, the reason KDTree is not installed by default isn't because it's C++ but rather because it used the C++ standard library (stdc++) which caused building problems on some systems without development libraries. I think either C++ or C is fine -- basically our only requirement is that it builds on multiple platforms. Sadly, sometimes figuring that out requires including it and seeing how many people complain :-). A simple setup.py that will work for this code is: from distutils.core import setup from distutils.extension import Extension setup( name = "OligoPython", version = "0.1", packages = ["Affymetrix"], ext_modules = [Extension("Affymetrix._cel", ["Affymetrix/celmodule.cc"])] ) This assumes that you put the code into a module directory called Affymetrix. The other change that is necessary is that the includes do not need to be relative to the python directory, so celmodule.cc just needs to do: #include "Python.h" #include "Numeric/arrayobject.h" After this it seems to build and install fine. I'd be happy to include this in Biopython if you are willing, but do have a few suggestions for the code: 1. I'd prefer it to be named something like Bio.Affymetrix rather then something more generic like Bio.Oligo -- since that would reflect it's purpose and use a little better. I'm not sure exactly what your development goals are with this, but if the main goal now is paring cel files and manipulating them this makes some sense. Michiel may also have some input here (which is probably more useful then mine). 2. The current Cel class integrates both parsing and storing the resulting data in the same class. To be more consistent with Biopython, I think it would be nice to separate out the work into two classes something like: CelParser -- has the parse function and returns a CelRecord object CelRecord -- contains the parsed data (all of the _pixels, _stdev, _npix, _nrows, _ncols attributes) and the functions which return them. Other then this, things look good -- let me know how you want to proceed forward on this and maybe coordinate with Michiel if he has plans for dealing with microarray data and integrating this with Cluster code and would like to be involved. Thanks again for the work! Brad From chapmanb at uga.edu Tue Apr 6 18:12:23 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] aaindex1 In-Reply-To: <406DB1A5.3000304@burnham.org> References: <406DB1A5.3000304@burnham.org> Message-ID: <20040406221223.GC25784@evostick.agtec.uga.edu> Hey Iddo; > I wrote a module for parsing the aaindex1 database. Sweet. Good stuff. Thanks much. > This module contains the following classes: > AAIndex1Record: holds the information from a single aaindex1 entry > AAIndex1Reader: parses the aaindex1 file > > and one ustility which reads the aaindex1 file and returns a dictionary > of entries. > > Should I open a new directory under Bio/AAIndex1, or is there a more > appropriate place for this? Can we use Bio/AAIndex instead of AAIndex1? I'm not sure I really understand what the 1 signifies but it just seems a bit weird to me for no really good reason. Other than that, checking it in under there sounds great. If there's a good reason for using AAIndex1 (like there is an AAIndex2 or something else) just tell me and do what you think it best. Oh yeah, and where is my documentation for Alphabet/Reduced.py? I'm gonna have to send my "documentation collection" agency out to see you pretty soon :-). Thanks again! Brad From idoerg at burnham.org Tue Apr 6 19:05:13 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] aaindex1 Message-ID: <407337A9.1060409@burnham.org> Brad Chapman wrote: > Can we use Bio/AAIndex instead of AAIndex1? I'm not sure I really > understand what the 1 signifies but it just seems a bit weird to me > for no really good reason. Other than that, checking it in under > there sounds great. If there's a good reason for using AAIndex1 > (like there is an AAIndex2 or something else) just tell me and do > what you think it best. AAIndex1 is a database (flat file) of 494 numeric values assigned to amino acids. AAIndex2 are substitution matrices. I can create an AAIndex directory, under which I will place the AAIndex1.py file. AAIndex2 will have to wait. > Oh yeah, and where is my documentation for Alphabet/Reduced.py? I'm > gonna have to send my "documentation collection" agency out to see > you pretty soon :-). Egads! Not the dread DCA!!! Just commited to CVS new versions of Bio.utils and Alphabet.Reduced it... ./I -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From chapmanb at uga.edu Tue Apr 6 19:07:45 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] aaindex1 In-Reply-To: <407337A9.1060409@burnham.org> References: <407337A9.1060409@burnham.org> Message-ID: <20040406230745.GG25784@evostick.agtec.uga.edu> Hi Iddo; > AAIndex1 is a database (flat file) of 494 numeric values assigned to > amino acids. AAIndex2 are substitution matrices. I can create an AAIndex > directory, under which I will place the AAIndex1.py file. AAIndex2 will > have to wait. That sounds great -- no worries about the AAIndex2 -- it'll happen when it happens. In the meantime, as long as it's documented (Biopython DCA, watch out) what AAIndex1 does and what files it deal with that sounds great. > Just commited to CVS new versions of Bio.utils and Alphabet.Reduced it... Sweet. Threats always work best -- I never imagined marrying into the mob would do so much to advance my career. Brad From chapmanb at uga.edu Tue Apr 6 18:31:14 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] EUtils In-Reply-To: <9F378DC6-8806-11D8-B94E-000393C92466@dalkescientific.com> References: <9F378DC6-8806-11D8-B94E-000393C92466@dalkescientific.com> Message-ID: <20040406223114.GD25784@evostick.agtec.uga.edu> Hey Andrew; Great to hear from you, as always! Hope you're doing well. > I was at the PyCon conference a couple weeks ago and talked with > Mark Johnson from NCBI about several things, including the EUtils > client package in Biopython. > > That reminded me that I needed to clean up the package and update > it to support the latest NCBI interface (they added a couple new > features in the last two years). I've got some free time this > week so I'm working on it. Sweet -- that sounds great. Always glad to have work on it. I know a few small fixes have gone into the CVS so you may want to work directly from it, but other than that it's all yours. > - don't require the response the DTD. I now believe that DTDs > for this task are the wrong approach. I'm using elementtree. Cool. This will also clean up a lot of the install messiness in setup.py necessary to get the DTDs installed. > - figure out a better way to do live tests against the server If you get somewhere with this please do let me know -- I added some tests for Registry code recently but have been fairly frustrated with getting errors because the server is dying. One thing I have been thinking about is using timeoutsocket: http://www.timo-tasi.org/python/timeoutsocket.py to just die neatly on tests if the server is timing out. > For the last, what I want to do is move the documentation > and tests out of Bio/EUtils/ and into the proper places for > Biopython. As stands EUtils is distributable independent of > Biopython. I don't think that's worthwhile so I'm thinking > of no longer doing that. Sounds great -- I've been adding independent documentation into Docs/cookbook. You'd just need to make a directory for EUtils. > Is anyone here using the EUtils client for anything and > has code that I can use? I've recently added it within the registry system in config/DBRegistry.py (class EUtilsDB), so that we can do retrieval from NCBI in the current right way. Other then that I've used it in a couple of talks I've given: http://www.biopython.org/docs/presentations/biopython_exelixis.pdf and: http://www.biopython.org/docs/presentations/bosc_biopython.pdf and I attached a file which I use semi-regularly for retrieving databases by Taxonomy ID from NCBI (which actually used the timeoutsocket module I mention above, but from my own personal code base I use at work). Hope those help some. Glad to see you around. Brad -------------- next part -------------- #!/usr/bin/env python """Fetch a number of FASTA files from NCBI based on a taxonomy queries. This script is meant to be used to scheduled batch downloads of a number of databases based on taxonomy ids. It reads an input file specified on the commandline that looks like: file_name,tax_id,not_tax_id,not_tax_id Where: file_name -- the name of the output file (ie. arabidopsis.fasta) tax_id -- the primary taxonmy id we are fetching (ie. 3328) not_tax_id -- a list of taxonomy ids to not include. This list can be empty or as long as you want. Usage: python get_taxonomy_list.py Where: -- the location of the file to read the taxonomy information from, formatted as above The files are output to a directory called "taxa-" where is the date the program was run on (ie. 20020904). """ import sys import os import sgmllib import time import urlparse import time # biopython from Bio import Fasta from Bio.EUtils import HistoryClient from Bio.PGML.Utils import timeoutsocket timeoutsocket.setDefaultSocketTimeout(30) VERBOSE = 1 def main(tax_file): assert os.path.exists(tax_file), "Cannot find taxonomy file" # create the output directory start_dir = os.path.dirname(tax_file) time_info = time.localtime(time.time()) time_name = "%i%02i%02i" % (time_info[0], time_info[1], time_info[2]) out_name = "taxa-%s" % time_name output_dir = os.path.join(start_dir, out_name) if not(os.path.exists(output_dir)): os.makedirs(output_dir) all_tax_info = process_tax_file(tax_file) for file_name, tax_id, not_ids in all_tax_info: output_file = os.path.join(output_dir, file_name) process_taxonomy_id(tax_id, not_ids, output_file) def process_tax_file(tax_file): """Split a taxonomy information file up into relevant info from commas. """ tax_info = [] handle = open(tax_file, "r") for line in handle.xreadlines(): line = line.strip() if line and line.find("#") == -1: parts = line.split(",") if len(parts) < 2: raise ValueError("Line is badly formatted: %s" % line) filename = parts[0] tax_id = parts[1] not_ids = parts[2:] tax_info.append((filename, tax_id, not_ids)) handle.close() return tax_info def process_taxonomy_id(taxonomy_id, not_ids, output_file): """Deal with the process of retrieving the FASTA file for a query. This uses the new EUtils interface at NCBI, which is so much faster and less annoying then the old way. """ if VERBOSE: print "Saving taxonomy id %s to %s" % (taxonomy_id, output_file) # build the query query = "txid%s[Organism]" % taxonomy_id for not_id in not_ids: query += " NOT txid%s[Organism]" % not_id client = HistoryClient.HistoryClient() while 1: try: results = client.search(query, db = "nucleotide") break except (timeoutsocket.Timeout, timeoutsocket.Error, ValueError): print "Timed out talking to NCBI, trying again" time.sleep(10) if VERBOSE: print "\tSearch got %s results" % (len(results)) while 1: try: fasta = results.efetch(retmode = "text", rettype = "fasta") break except (timeoutsocket.Timeout, timeoutsocket.Error, ValueError): print "Timed out talking to NCBI, trying again" time.sleep(10) output_handle = open(output_file, "w") while 1: line = fasta.readline() if not(line): break output_handle.write(line) output_handle.close() if VERBOSE: output_handle = open(output_file) num_seqs = check_seqs(output_handle) output_handle.close() print "\tWrote %s sequences" % (num_seqs) def check_seqs(handle): """Count the number of FASTA sequences in the passed handle. """ num_seqs = 0 iterator = Fasta.Iterator(handle) while 1: rec = iterator.next() if not rec: break num_seqs += 1 return num_seqs if __name__ == "__main__": if len(sys.argv) != 2: print "Invalid number of arguments supplied" print __doc__ sys.exit() sys.exit(main(sys.argv[1])) From thamelry at binf.ku.dk Wed Apr 7 07:23:55 2004 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Oligopython page In-Reply-To: <20040406220024.GB25784@evostick.agtec.uga.edu> References: <20040405150431.67425.qmail@web21408.mail.yahoo.com> <40723BB0.5040809@ims.u-tokyo.ac.jp> <20040406220024.GB25784@evostick.agtec.uga.edu> Message-ID: <200404071323.55612.thamelry@binf.ku.dk> On Wednesday 07 April 2004 00:00, Brad Chapman wrote: > If I remember properly, the reason KDTree is not installed by > default isn't because it's C++ but rather because it used the C++ > standard library (stdc++) which caused building problems on some > systems without development libraries. To quote from an older post: KDTree works fine. But: it needs a working C++ compiler, and a complete installation of Numpy (including header files) to compile. It seems that on Solaris it does not compile due to a bug in Distutils, which is not really coping well with C++ on some platforms (ie. missing flags, compiling with gcc instead of g++ etc.). --- Thomas Hamelryck Bioinformatik centret Universitetsparken 15 Bygning 10 DK-2100 K?benhavn ? Denmark http://www.binf.ku.dk/users/thamelry/ From mdehoon at ims.u-tokyo.ac.jp Wed Apr 7 22:54:32 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Oligopython page In-Reply-To: <200404071323.55612.thamelry@binf.ku.dk> References: <20040405150431.67425.qmail@web21408.mail.yahoo.com> <40723BB0.5040809@ims.u-tokyo.ac.jp> <20040406220024.GB25784@evostick.agtec.uga.edu> <200404071323.55612.thamelry@binf.ku.dk> Message-ID: <4074BEE8.3070704@ims.u-tokyo.ac.jp> I tried compiling KDTree on a Unix machine running SunOS 5.8. It has two Python versions, one compiled with gcc and the other one with the native cc compiler. Distutils uses the same compiler as the one used to compile Python itself. The gcc-Python compiled KDTree without problems, but the cc-Python did not, as cc doesn't handle C++. The same problem may occur on other Unix platforms if the native compiler rather than gcc was used to compile Python. --Michiel. Thomas Hamelryck wrote: > On Wednesday 07 April 2004 00:00, Brad Chapman wrote: > > >>If I remember properly, the reason KDTree is not installed by >>default isn't because it's C++ but rather because it used the C++ >>standard library (stdc++) which caused building problems on some >>systems without development libraries. > > > To quote from an older post: > > KDTree works fine. But: it needs a working C++ compiler, and > a complete installation of Numpy (including header files) to compile. > It seems that on Solaris it does not compile due to a bug in Distutils, which > is not really coping well with C++ on some platforms (ie. missing flags, > compiling with gcc instead of g++ etc.). > > --- > Thomas Hamelryck > Bioinformatik centret > Universitetsparken 15 > Bygning 10 > DK-2100 K?benhavn ? > Denmark > http://www.binf.ku.dk/users/thamelry/ > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From chapmanb at uga.edu Mon Apr 19 07:56:16 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Work towards getting KDTree compiling Message-ID: <20040419115616.GA12006@misterbd.agtec.uga.edu> Hello Thomas and all; Since Michiel wrote about the problems that were causing the KDTree C++ extension to fail, I've been taking a look into seeing how we might fix that problem and get KDTree compiled by default in Biopython. To quote from Michiel's mail: > I tried compiling KDTree on a Unix machine running SunOS 5.8. It has two > Python versions, one compiled with gcc and the other one with the native cc > compiler. Distutils uses the same compiler as the one used to compile > Python itself. The gcc-Python compiled KDTree without problems, but the > cc-Python did not, as cc doesn't handle C++. The same problem may occur on > other Unix platforms if the native compiler rather than gcc was used to > compile Python. I was a bit surprised that distutils didn't recognize the C++ code as such and use a C++ compiler, so I dug around a bit in the distutils code and looked at how it figured out what is C++ and what is C. Turns out, it uses the filename extensions to do this via the following dictionary: language_map = {".c" : "c", ".cc" : "c++", ".cpp" : "c++", ".cxx" : "c++", ".m" : "objc", } I guess if it doesn't find it here, it defaults to using the C compiler, which is what it seemed like it was doing. As I dug further, I discovered you can set the language when specifying the extension, and then this'll be used instead of the filename extension detection. So, I checked in a modified setup.py that compiles KDTree by default and sets the language to c++: Extension('Bio.KDTree._CKDTree', ["Bio/KDTree/KDTree.C", "Bio/KDTree/KDTree.swig.C"], libraries=["stdc++"], language="c++" ), I hope maybe this will fix the problem on non-gcc systems and get KDTree compiled by default. I was wondering if people on potential problem systems (Solaris, Windows, Mac OSX) would mind testing this out to see if it compiles happily. If so, I'll leave it in for the next release -- otherwise, we'll have to keep working. Thanks much for any reports (hopefully of success :-). Brad From jeffrey_chang at stanfordalumni.org Mon Apr 19 14:32:38 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Work towards getting KDTree compiling In-Reply-To: <20040419115616.GA12006@misterbd.agtec.uga.edu> References: <20040419115616.GA12006@misterbd.agtec.uga.edu> Message-ID: On Apr 19, 2004, at 7:56 AM, Brad Chapman wrote: > So, I checked in a modified setup.py that compiles KDTree by default > and sets the language to c++: > I hope maybe this will fix the problem on non-gcc systems and get > KDTree compiled by default. > > I was wondering if people on potential problem systems (Solaris, > Windows, Mac OSX) would mind testing this out to see if it compiles > happily. If so, I'll leave it in for the next release -- otherwise, > we'll have to keep working. It built successfully for me, on Mac OS X 10.3.3. The native compiler for Macs is gcc, so it also worked before the modification. The modification did, however, change the compiler for the module from gcc to c++. However, this is fine, because c++ is symlinked (by default? I didn't do it) to g++. Relevant build output is attached. Jeff building 'Bio.KDTree._CKDTree' extension creating build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/opt/local/include/python2.3 -c Bio/KDTree/KDTree.C -o build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/KDTree.o Bio/KDTree/KDTree.C: In member function `void KDTree::neighbor_simple_search(float)': Bio/KDTree/KDTree.C:914: warning: comparison between signed and unsigned integer expressions Bio/KDTree/KDTree.C:923: warning: comparison between signed and unsigned integer expressions gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/opt/local/include/python2.3 -c Bio/KDTree/KDTree.swig.C -o build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/KDTree.swig.o c++ -L/opt/local/lib -bundle -bundle_loader /opt/local/bin/python2.3 build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/KDTree.o build/temp.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/KDTree.swig.o -lstdc++ -o build/lib.darwin-7.3.0-Power_Macintosh-2.3/Bio/KDTree/_CKDTree.so From thamelry at binf.ku.dk Mon Apr 19 15:12:30 2004 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Re: Work towards getting KDTree compiling In-Reply-To: <20040419115616.GA12006@misterbd.agtec.uga.edu> References: <20040419115616.GA12006@misterbd.agtec.uga.edu> Message-ID: <32794.80.63.229.120.1082401950.squirrel@www.binf.ku.dk> Hi Brad, > Since Michiel wrote about the problems that were causing the KDTree C++ > extension to fail, I've been taking a look into seeing how we > might fix that problem and get KDTree compiled by default in > Biopython. It works for me on Mandrake 9.2 (it also worked before, of course). But I've noticed that in the first two steps gcc is still used instead of g++. Output: gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -fomit-frame-pointer -pipe -march=i586 -mcpu=pentiumpro -g -fPIC -I/usr/include/python2.3 -c Bio/KDTree/KDTree.swig.C -o build/temp.linux-i686-2.3/Bio/KDTree/KDTree.swig.o gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -fomit-frame-pointer -pipe -march=i586 -mcpu=pentiumpro -g -fPIC -I/usr/include/python2.3 -c Bio/KDTree/KDTree.C -o build/temp.linux-i686-2.3/Bio/KDTree/KDTree.o g++ -pthread -shared build/temp.linux-i686-2.3/Bio/KDTree/KDTree.o build/temp.linux-i686-2.3/Bio/KDTree/KDTree.swig.o -lstdc++ -o build/lib.linux-i686-2.3/Bio/KDTree/_CKDTree.so -Thomas From hoffman at ebi.ac.uk Mon Apr 19 16:01:24 2004 From: hoffman at ebi.ac.uk (Michael Hoffman) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Re: Work towards getting KDTree compiling In-Reply-To: <32794.80.63.229.120.1082401950.squirrel@www.binf.ku.dk> References: <20040419115616.GA12006@misterbd.agtec.uga.edu> <32794.80.63.229.120.1082401950.squirrel@www.binf.ku.dk> Message-ID: On Mon, 19 Apr 2004 thamelry@binf.ku.dk wrote: > It works for me on Mandrake 9.2 (it also worked before, of course). > But I've noticed that in the first two steps gcc is still used instead of > g++. On OSF1 V5.1 it also used cc instead of cxx which means it doesn't work. The reason for this is that the compiler_so attribute of the compiler object is always set to the C compiler. There is no compiler_cxx_so. I think this is a bug in distutils. If you need additional options set for shared object compilation you can always do it through the CXX environment variable. I hope no one minds that I just checked in a "customization" (translation: slightly hacky workaround) to the build_ext_biopython class which fixed this on OSF1. Things still work for me on Redhat Linux 9 with the new setup.py. -- Michael Hoffman European Bioinformatics Institute From chapmanb at uga.edu Mon Apr 19 12:23:05 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Re: Work towards getting KDTree compiling In-Reply-To: References: <20040419115616.GA12006@misterbd.agtec.uga.edu> <32794.80.63.229.120.1082401950.squirrel@www.binf.ku.dk> Message-ID: <20040419162305.GB596@misterbd.agtec.uga.edu> Hey all; > > It works for me on Mandrake 9.2 (it also worked before, of course). > > But I've noticed that in the first two steps gcc is still used instead of > > g++. Ah ha -- well spotted. > On OSF1 V5.1 it also used cc instead of cxx which means it doesn't work. > > The reason for this is that the compiler_so attribute of the compiler > object is always set to the C compiler. There is no compiler_cxx_so. I > think this is a bug in distutils. Yeah, I'd say so as well -- it seems pretty pointless to only be able to set the language for part of the compilation. I just went back and looked again at what the behavior was when the language is autodetected, and it is the same. > I hope no one minds that I just checked in a "customization" > (translation: slightly hacky workaround) to the build_ext_biopython > class which fixed this on OSF1. Things still work for me on Redhat > Linux 9 with the new setup.py. Brilliant. It seems simple enough and only requires us to specify the language as c++ for our included C++ code. +1 from me for keeping this in the setup.py. If people on disparate platforms could give this another test and let me know if anything breaks that would be great. Especially like to hear from the ol' Windows folks. For the record, it all works fine for me on a FreeBSD machine with gcc (but we probably already knew that :-). Thanks for all the testing and thanks again for the patch Michael. Brad From mdehoon at ims.u-tokyo.ac.jp Sun Apr 25 08:00:24 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Documentation for Bio.LogisticRegression Message-ID: <408BA858.2030207@ims.u-tokyo.ac.jp> Dear all, Recently I have been using the logistic regression model in Bio.LogisticRegression to predict transcription factors in bacteria (thanks Jeff! Great work). Over the weekend, I wrote some documentation for this module and submitted it to CVS. Jeff (or other interested people), can you have a look at it and check if you agree with my description? Feel free to add yourself as one of the authors. By the way, the function train in Bio.LogisticRegression has a keyword typecode, whose usage I didn't understand, so it is not included in the documentation. The documentation is under biopython/Doc/cookbook/LogisticRegression: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/cookbook/LogisticRegression/?cvsroot=biopython --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From thamelry at binf.ku.dk Sun Apr 25 11:50:04 2004 From: thamelry at binf.ku.dk (thamelry@binf.ku.dk) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] 3D Vector class In-Reply-To: <408BA858.2030207@ims.u-tokyo.ac.jp> References: <408BA858.2030207@ims.u-tokyo.ac.jp> Message-ID: <33255.80.63.229.248.1082908204.squirrel@www.binf.ku.dk> A Vector class representing a 3D vector was added to Bio.PDB Operations include dot and cross product, addition, substraction, division by a scalar, and calculation of norm, angles and dihedral angles. Cheers, -Thomas From jeffrey_chang at stanfordalumni.org Sun Apr 25 21:45:23 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Documentation for Bio.LogisticRegression In-Reply-To: <408BA858.2030207@ims.u-tokyo.ac.jp> References: <408BA858.2030207@ims.u-tokyo.ac.jp> Message-ID: <61FF19EA-9723-11D8-8A46-000A956845CE@stanfordalumni.org> Hi Michiel, Wow, this is a really nice document! About the only comment that I have, is that in the first sentence "distinguish K classes from each other" should be "distinguish 2 classes." While there are multinomial models for logistic regression, this code does not handle them. The "typecode" parameter allows the user to choose the type of Numeric matrix to use. Since Newton-Raphson can eat up a lot of memory, on large problems, sometimes it may be beneficial to use single-precision floats rather than double, which is used by default. Thus, "typecode" accepts Numeric typecodes, which are defined in the Numeric library like Numeric.Float16, Numeric.Float32, etc. I left that parameter undocumented because 1) it goes deeper into the internals than I normally like to expose, and 2) I wasn't sure how useful it is. Jeff On Apr 25, 2004, at 8:00 AM, Michiel Jan Laurens de Hoon wrote: > Dear all, > > Recently I have been using the logistic regression model in > Bio.LogisticRegression to predict transcription factors in bacteria > (thanks Jeff! Great work). Over the weekend, I wrote some > documentation for this module and submitted it to CVS. Jeff (or other > interested people), can you have a look at it and check if you agree > with my description? Feel free to add yourself as one of the authors. > By the way, the function train in Bio.LogisticRegression has a keyword > typecode, whose usage I didn't understand, so it is not included in > the documentation. > > The documentation is under biopython/Doc/cookbook/LogisticRegression: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/ > cookbook/LogisticRegression/?cvsroot=biopython > > --Michiel. > > -- > Michiel de Hoon, Assistant Professor > University of Tokyo, Institute of Medical Science > Human Genome Center > 4-6-1 Shirokane-dai, Minato-ku > Tokyo 108-8639 > Japan > http://bonsai.ims.u-tokyo.ac.jp/~mdehoon > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From gllbab at hotmail.com Sun Apr 25 23:38:07 2004 From: gllbab at hotmail.com (corey) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] Increase your metabolism with a pill! (fvmhpk) Message-ID: <1082950687-7537@excite.com> Slow down the effects of aging with Human Growth Hormone. As you get older, your body produces less H.G.H, thus your body deteriorates. Click here to learn more about it: http://yourpills.biz/hgh/index.php?pid=eph9106 venus snuffy vermont metallica barry jared guido deutsch corrado mars philip doom2 deutsch carolina impala medical amanda1 malcolm bull cesar philip tequila ariane christop nick praise From Marijn.vanderGaag at wur.nl Mon Apr 26 03:45:23 2004 From: Marijn.vanderGaag at wur.nl (Gaag, Marijn van der) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] new biopython modules Message-ID: <5F9035D8A446C84C903301CCD5FC8DB4190653@salte0010.wurnet.nl> Hi, I have made a parser and a record module for Fasta/ssearch similarity search results, similar to those made by Jeff Chang for parsing blast/recording the results. Are you interested in it for use in biopython? If so, let me know and I will send you the scripts. We (Plant Research International at the Wageningen University and Research Centre, The Netherlands) are using these two modules for storing results of batch fasta searches. However, I will not be available for maintaining these modules since I will move to another job soon and will not be involved in bioinformatics anymore. Greetings, marijn van der gaag Plant research International Wageningen University and Research Centre The Netherlands From j.a.casbon at qmul.ac.uk Tue Apr 27 07:45:20 2004 From: j.a.casbon at qmul.ac.uk (James Casbon) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] COMPASS parsing code Message-ID: <200404271245.20940.j.a.casbon@qmul.ac.uk> Hi, I have written some code for parsing compass results. Compass implements profile/profile alignment and is available by ftp. See: http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12547212 http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14500884 for more details. I have attached the code, which you might like to include in the biopython distribution. There are probably a few issues with the code that could make it better: * the unit tests use some sample input, file comtest1 and comtest2. These are just read using open. I have seen someone use test.locate or something like that, but I'm not sure how that works. If you want to enlighten me, I'll change it. * i have used regular expressions inefficiently, as I'm not sure how you're supposed to cache them using the _Scanner/_Consumer framework. At the moment each subroutine compiles an re when called, which can't be good. Again, please enlighten me to a better way and I will change it. regards, James -------------- next part -------------- A non-text attachment was scrubbed... Name: Compass.py Type: application/x-python Size: 12778 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20040427/96a80e10/Compass.bin -------------- next part -------------- Ali1: 60456.blo.gz.aln Ali2: allscop//14982.blo.gz.aln Threshold of effective gap content in columns: 0.5 length1=388 filtered_length1=386 length2=116 filtered_length2=115 Nseqs1=399 Neff1=12.972 Nseqs2=1 Neff2=11.313 Smith-Waterman score = 35 Evalue = 1.01e+03 QUERY 178 KKDLEEIAD ++ ++++++ QUERY 9 QAAVQAVTA Ali1: 60456.blo.gz.aln Ali2: allscop//14983.blo.gz.aln Threshold of effective gap content in columns: 0.5 length1=388 filtered_length1=386 length2=121 filtered_length2=119 Nseqs1=399 Neff1=12.972 Nseqs2=1 Neff2=11.168 Smith-Waterman score = 35 Evalue = 1.01e+03 QUERY 178 KKDLEEIAD ++ ++++++ QUERY 9 REAVEAAVD Ali1: 60456.blo.gz.aln Ali2: allscop//14984.blo.gz.aln Threshold of effective gap content in columns: 0.5 length1=388 filtered_length1=386 length2=145 filtered_length2=137 Nseqs1=399 Neff1=12.972 Nseqs2=1 Neff2=5.869 Smith-Waterman score = 37 Evalue = 5.75e+02 QUERY 371 LEEAMDRMER~~~V + ++++ + + + QUERY 76 LQNFIDQLDNpddL Ali1: 60456.blo.gz.aln Ali2: allscop//15010.blo.gz.aln Threshold of effective gap content in columns: 0.5 length1=388 filtered_length1=386 length2=141 filtered_length2=141 Nseqs1=399 Neff1=12.972 Nseqs2=1 Neff2=6.099 Smith-Waterman score = 37 Evalue = 5.75e+02 QUERY 163 LIINSP ++++++ QUERY 32 LFDAHD -------------- next part -------------- ....Ali1: 60456.blo.gz.aln Ali2: 60456.blo.gz.aln Threshold of effective gap content in columns: 0.5 length1=388 filtered_length1=386 length2=388 filtered_length2=386 Nseqs1=399 Neff1=12.972 Nseqs2=399 Neff2=12.972 Smith-Waterman score = 2759 Evalue = 0.00e+00 QUERY 2 LSDRLELVSASEIRKLFDIAAGMKDVISLGIGEPDFDTPQHIKEYAKEALDKGLTHYGPN ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ QUERY 2 LSDRLELVSASEIRKLFDIAAGMKDVISLGIGEPDFDTPQHIKEYAKEALDKGLTHYGPN QUERY IGLLELREAIAEKLKKQNGIEADPKTEIMVLLGANQAFLMGLSAFLKDGEEVLIPTPAFV ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ QUERY IGLLELREAIAEKLKKQNGIEADPKTEIMVLLGANQAFLMGLSAFLKDGEEVLIPTPAFV QUERY SYAPAVILAGGKPVEVPTYEEDEFRLNVDELKKYVTDKTRALIINSPCNPTGAVLTKKDL ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ QUERY SYAPAVILAGGKPVEVPTYEEDEFRLNVDELKKYVTDKTRALIINSPCNPTGAVLTKKDL QUERY EEIADFVVEHDLIVISDEVYEHFIYDDARHYSIASLDGMFERTITVNGFSKTFAMTGWRL ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ QUERY EEIADFVVEHDLIVISDEVYEHFIYDDARHYSIASLDGMFERTITVNGFSKTFAMTGWRL QUERY GFVAAPSWIIERMVKFQMYNATCPVTFIQYAAAKALKDERSWKAVEEMRKEYDRRRKLVW ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ QUERY GFVAAPSWIIERMVKFQMYNATCPVTFIQYAAAKALKDERSWKAVEEMRKEYDRRRKLVW QUERY KRLNEMGLPTVKPKGAFYIFPRIRDTGLTSKKFSELMLKEARVAVVPGSAFGKAGEGYVR ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ QUERY KRLNEMGLPTVKPKGAFYIFPRIRDTGLTSKKFSELMLKEARVAVVPGSAFGKAGEGYVR QUERY ISYATAYEKLEEAMDRMERVLKERKL ++++++++++++++++++++++++++ QUERY ISYATAYEKLEEAMDRMERVLKERKL From bugzilla-daemon at portal.open-bio.org Tue Apr 27 19:31:34 2004 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon@portal.open-bio.org) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] [Bug 1627] New: "Unexpected end of stream" when parsing Blast results Message-ID: <200404272331.i3RNVY73001338@portal.open-bio.org> http://bugzilla.bioperl.org/show_bug.cgi?id=1627 Summary: "Unexpected end of stream" when parsing Blast results Product: Biopython Version: 1.24 Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev@biopython.org ReportedBy: rayka@mit.edu The parsing of Blast results seems to have difficulty in recognizing the end of a Blast file. I've gotten an "Unexpected end of stream" error with several different genes. The code: blast_results = NCBIWWW.blast('blastn', 'nr', Seq.tostring(), entrez_query=org, filter='off', expect='1000', word_size='7') blast_parser = NCBIWWW.BlastParser() blast_record = blast_parser.parse(blast_results) Results in error: blast_record = blast_parser.parse(blast_results) File "C:\Python23\lib\site-packages\Bio\Blast\NCBIWWW.py", line 48, in parse self._scanner.feed(handle, self._consumer) File "C:\Python23\lib\site-packages\Bio\Blast\NCBIWWW.py", line 97, in feed has_re=re.compile(r'.?BLAST')) File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 335, in read_and_call_until line = safe_readline(uhandle) File "C:\Python23\lib\site-packages\Bio\ParserSupport.py", line 411, in safe_readline raise SyntaxError, "Unexpected end of stream." SyntaxError: Unexpected end of stream. When the line in safe_readline is printed out before the error is thrown the output looks like: effective search space used: 26390847234 T: 0 A: 0 X1: 6 (11.9 bits) X2: 15 (29.7 bits) S1: 12 (24.3 bits) S2: 13 (26.3 bits) I suspect this is actually the end of the Blast file but the program does not recognize as such. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From idoerg at burnham.org Thu Apr 29 14:52:42 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:32 2005 Subject: [Biopython-dev] How many people on biopython lists? Message-ID: <40914EFA.4010809@burnham.org> Hi, Can someone tell me how many subscribers are there on the biopython and biopython-dev lists? It's for a book chapter.. good PR. Thanks, Iddo -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo