From nicolas.chauvat at logilab.fr Mon Apr 2 11:58:49 2007 From: nicolas.chauvat at logilab.fr (Nicolas Chauvat) Date: Mon, 2 Apr 2007 17:58:49 +0200 Subject: [BioPython] [ANN] EuroPython 2007: Call for Proposals Message-ID: <20070402155849.GF24884@crater.logilab.fr> Book Monday 9th July to Wednesday 11th July 2007 in your calendar! EuroPython 2007, the European Python and Zope Conference, will be held in Vilnius, Lithuania. Last year's conference was a great success, featuring a variety of tracks, amazing lightning talks and inspiring keynotes. With your participation, we want to make EuroPython 2007, the sixth EuroPython, even more successful than the previous five. Talks, Papers and Themes ------------------------ This year we have decided to borrow a few good ideas from PyCon, one of which is to move away from the 'track' structure. Instead, speakers are invited to submit presentations about anything they have done that they think would be of interest to the Python community. We will then arrange them into related groups and schedule them in the space available. In the past, EuroPython participants have found the following themes to be of interest: * Science * Python Language and Libraries * Web Related Technologies * Education * Games * Agile Methodologies and Testing * Social Skills In addition to talks, we will also accept full paper submissions about any of the above themes. The Call for Refereed Papers will be posted shortly. The deadline for talk proposals is Friday 18th May at midnight (24:00 CEST, Central European Summer Time, UTC+2). Other ways to participate ------------------------- Apart from giving talks, there are plenty of other ways to participate in the conference. Just attending and talking to people you find here can be satisfying enough, but there are three other kinds of activity you may wish to plan for: Lightning Talks, Open Space and Sprints. Lightning Talks are very short talks that give you just enough time to introduce a topic or project, Open Space is an area reserved for informal discussions, and Sprints are focused gatherings for developers interested in particular projects. For more information please see the following pages: * Lightning Talks: http://www.europython.org/sections/events/lightning_talks * Open Space: http://www.europython.org/sections/events/open_space * Sprints: http://www.europython.org/sections/sprints_and_wiki Your Contribution ----------------- To propose a talk or a paper, go to... * http://www.europython.org/submit For more general information on the conference, please visit... * http://www.europython.org/ Looking forward to seeing what you fine folk have been up to, The EuroPython Team -- Nicolas Chauvat logilab.fr - services en informatique avanc?e et gestion de connaissances From alexl at users.sourceforge.net Sun Apr 8 05:27:27 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Sun, 08 Apr 2007 02:27:27 -0700 Subject: [BioPython] Biopython package for Fedora Message-ID: <274pnrcj28.fsf@delpy.biol.berkeley.edu> (Apologies if you receive multiple copies, this is a repost, my original bounced) Hello Biopythonistas, I have created preliminary RPM package of the latest release of Biopython (1.43) for Fedora as part of the "Fedora Package Collection" (formerly "Fedora Extras" since Fedora Core+Fedora Extras are merging). (I am also packaging Bioperl, you can see my some of my progress including links to the reviews here: http://fedoraproject.org/wiki/AlexLancaster) I am almost ready to submit my package for review, but several issues have arisen during the packaging that I hope the biopython list can help clarify before I do so: 1) Will Biopython work OK with Python 2.5? I ask because the next release of Fedora (Fedora 7) will only ship with Python 2.5 and packages first need to build in the development branch (which will eventually become Fedora 7) first. 2) The "python setup.py install" step appears to install a lot of scripts with the "#!/usr/bin/env python" at the top into the main /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.: /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py should these scripts be installed somewhere more appropriate such as /usr/bin/GenericTools.py or do they also function as classes as well as executables in their own right? The "rpmlint" tool which is part of the packaging scans a package built for Fedora and identifies certain aspects of the package as not following the package and/or file system hierarchy (FHS) guidelines. [1] 3) The setup.py install also installs some architecture-independent non-code data files (such as DTDs) which I would normally expect to live in /usr/share/python-biopython/DTDs (or somesuch) for example: /usr/lib/python2.4/site-packages/Bio/EUtils/DTDs/eSearch_020511.dtd Is this the normal location for these DTDs and does the rest of the bipython framework expect to find these files in this location? 4) If possible, Fedora packages should run all unit tests provided in the upstream package at package time, just before creating the RPM. I would like to do this for biopython as well, but there doesn't seem to be an easy way to disable the PyUnit GUI that pops up and run in batch-only non-GUI mode. I looked at the code in Tests/run_tests.py and it does have a "--no-gui" option, but there does not appear to be any way to run this from the top-level setup.py file, e.g.: python setup.py test --no-gui doesn't work. 5) My initial package depends on the required software: python, mx, python-numeric, as well as the optional python-reportlab, MySQL-python and flex which are all also included in Fedora, but I won't have Wise2 available since it is not yet in Fedora, at least not until I (or somebody else) packages Wise2. 6) Is Biopython-corba still active, and if so, should it also be packaged? Are there any interdependencies with the base biopython package? (No promises, though!) Thanks, Alex [1] I attempted to attach the list at the end of the e-mail for the developers to identify and tell me if these files are OK where the setup.py currently puts them, but my original e-mail bounced probably because of the attachment. -- Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona From sbassi at gmail.com Sun Apr 8 15:12:23 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Sun, 8 Apr 2007 16:12:23 -0300 Subject: [BioPython] Biopython package for Fedora In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu> References: <274pnrcj28.fsf@delpy.biol.berkeley.edu> Message-ID: On 4/8/07, Alex Lancaster wrote: > 1) Will Biopython work OK with Python 2.5? I ask because the next > release of Fedora (Fedora 7) will only ship with Python 2.5 and > packages first need to build in the development branch (which will > eventually become Fedora 7) first. This is the only question I am able to answer. Yes, it does work with Python 2.5. From chris.lasher at gmail.com Sun Apr 8 16:14:54 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 8 Apr 2007 16:14:54 -0400 Subject: [BioPython] Biopython package for Fedora In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu> References: <274pnrcj28.fsf@delpy.biol.berkeley.edu> Message-ID: <128a885f0704081314r490b7fbdj71d8b16612e8b54c@mail.gmail.com> On 4/8/07, Alex Lancaster wrote: > 2) The "python setup.py install" step appears to install a lot of > scripts with the "#!/usr/bin/env python" at the top into the main > /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.: > > /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py > > should these scripts be installed somewhere more appropriate such > as /usr/bin/GenericTools.py or do they also function as classes as > well as executables in their own right? The line #!/usr/bin/env python retrieves the appropriate Python installation as specified by the user's defined environment. This is preferable to hard-coding #!/usr/bin/python, which will always use the Python installation pointed to by /usr/bin/python. For most users, this doesn't matter, but if the user desires to use a local or custom installation of Python, they must change all these scripts by hand to point to their preferred Python install. Say my distribution's Python is version 2.3 but I have installed a local copy of version 2.5 which is symlinked at /usr/local/bin/python. I can set /usr/local/bin/python ahead in my path and the scripts with "#!/usr/bin/env python" will then execute with my preferred version (2.5) of Python rather than the system version (2.3), but the scripts with "#!/usr/bin/python" will execute with the system version (2.3) rather than my prefered version (2.5). Web search for more details. Chris From alexl at users.sourceforge.net Sun Apr 8 18:28:06 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Sun, 08 Apr 2007 15:28:06 -0700 Subject: [BioPython] Biopython package for Fedora Message-ID: >>>>> "CL" == Chris Lasher writes: CL> On 4/8/07, Alex Lancaster wrote: >> 2) The "python setup.py install" step appears to install a lot of >> scripts with the "#!/usr/bin/env python" at the top into the main >> /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.: >> >> /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py >> >> should these scripts be installed somewhere more appropriate such >> as /usr/bin/GenericTools.py or do they also function as classes as >> well as executables in their own right? CL> The line #!/usr/bin/env python CL> retrieves the appropriate Python installation as specified by the CL> user's defined environment. [...] I'm aware of the function of the "/usr/bin/env python" vs. "/usr/bin/python", that isn't the problem. My question was about the *location* of the script files when installed in /usr/lib/python2.4/site-packages/Bio/* vs. being installed as executables in /usr/bin/. It seems that there are a number of files which contain both classes and scripts and rpmlint identifies all files containing scripts which aren't installed in a location like /usr/bin/ to make sure that scripts aren't unintentionally installed in a non-executable location. Alex From alexl at users.sourceforge.net Sun Apr 8 00:51:04 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Sat, 07 Apr 2007 21:51:04 -0700 Subject: [BioPython] Biopython package(s) for Fedora Message-ID: An embedded and charset-unspecified text was scrubbed... Name: biopython-rpmlint.txt Url: http://lists.open-bio.org/pipermail/biopython/attachments/20070407/e9f04488/attachment.txt From chris.lasher at gmail.com Wed Apr 11 00:43:14 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 11 Apr 2007 00:43:14 -0400 Subject: [BioPython] Biopython package for Fedora In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu> References: <274pnrcj28.fsf@delpy.biol.berkeley.edu> Message-ID: <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com> On 4/8/07, Alex Lancaster wrote: > 4) If possible, Fedora packages should run all unit tests provided in > the upstream package at package time, just before creating the RPM. > I would like to do this for biopython as well, but there doesn't > seem to be an easy way to disable the PyUnit GUI that pops up and > run in batch-only non-GUI mode. I looked at the code in > Tests/run_tests.py and it does have a "--no-gui" option, but there > does not appear to be any way to run this from the top-level > setup.py file, e.g.: > > python setup.py test --no-gui > > doesn't work. Alex, thanks for pointing this out. I sat down tonight and resolved this issue. The patch on there should be the fix needed. Save it as setup_test.patch (or whatever, but that's convenient), place it in the same directory as setup.py, and patch with the command patch -p0 < setup_test.patch Alternatively, I can send you the patched files (setup.py and Tests/run_tests.py). Thanks again for pointing this out. Chris From timmcilveen at talktalk.net Wed Apr 11 10:15:52 2007 From: timmcilveen at talktalk.net (tim) Date: Wed, 11 Apr 2007 15:15:52 +0100 Subject: [BioPython] installing on Mandriva Linux Message-ID: <1176300953.3621.13.camel@localhost> Hi, I am getting lots of errors during python setup using biopython setup.py install. I am running python 2.4.3. on Linux and have mxtextools, numeric and headers etc. installed. The installation is definately not working as i get errors when i type some of the test code such as: from Bio.Seq import Seq I get a traceback error. Can anyone help. I'm new to biopython and Linux. I have everything working fine under Windows. I get problems from this point onwards in the install, with lots of Bio/Cluster/clustermodule errors: Do you want to continue this installation? (Y/n) Y *** Bio.KDTree *** NOT built by default The Bio.PDB.NeighborSearch module depends on the Bio.KDTree module, which in turn, depends on C++ code that does not compile cleanly on all platforms. Hence, Bio.KDTree is not built by default. Would you like to build Bio.KDTree ? (y/N) y creating build/temp.linux-i686-2.4/Bio/Cluster gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fomit-frame-pointer -march=i586 -mtune=pentiumpro -fasynchronous-unwind-tables -g -fPIC -IBio/Cluster -I/usr/include/python2.4 -c Bio/Cluster/clustermodule.c -o build/temp.linux-i686-2.4/Bio/Cluster/clustermodule.o Bio/Cluster/clustermodule.c:2:33: error: Numeric/arrayobject.h: No such file or directory Bio/Cluster/clustermodule.c:20: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_data?: Bio/Cluster/clustermodule.c:27: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:27: error: (Each undeclared identifier is reported only once Bio/Cluster/clustermodule.c:27: error: for each function it appears in.) Bio/Cluster/clustermodule.c:27: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:27: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:35: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:44: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:45: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:84: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c:98: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_mask?: Bio/Cluster/clustermodule.c:109: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:113: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:113: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:121: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:128: error: ?PyArray_INT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:130: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:178: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c:191: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_weight?: Bio/Cluster/clustermodule.c:197: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:201: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:201: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:209: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:210: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:212: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:255: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c:265: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token Bio/Cluster/clustermodule.c:372: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_clusterid?: Bio/Cluster/clustermodule.c:383: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:389: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:389: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:397: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:399: error: ?PyArray_INT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:401: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:471: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c:482: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?free_distances?: Bio/Cluster/clustermodule.c:485: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:489: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:489: error: ?a? undeclared (first use in this function) Bio/Cluster/clustermodule.c:489: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:515: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_distance?: Bio/Cluster/clustermodule.c:522: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:522: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:522: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:545: error: ?a? undeclared (first use in this function) Bio/Cluster/clustermodule.c:545: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:557: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:576: warning: assignment makes pointer from integer without a cast Bio/Cluster/clustermodule.c:584: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:601: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:628: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:628: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c:637: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:640: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:716: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?create_celldata?: Bio/Cluster/clustermodule.c:725: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:725: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:725: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:753: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_index?: Bio/Cluster/clustermodule.c:757: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:766: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:766: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:776: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:778: error: ?PyArray_INT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:780: warning: assignment makes pointer from integer without a cast Bio/Cluster/clustermodule.c:787: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:803: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:818: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c: In function ?PyTree_cut?: Bio/Cluster/clustermodule.c:1165: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1165: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1165: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:1181: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:1187: error: ?clusterid? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1197: warning: return makes pointer from integer without a cast Bio/Cluster/clustermodule.c: In function ?py_kcluster?: Bio/Cluster/clustermodule.c:1312: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1312: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1315: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1318: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1325: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1379: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:1384: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:1416: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c: In function ?py_kmedoids?: Bio/Cluster/clustermodule.c:1501: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1501: error: ?aDISTANCES? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1504: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1533: error: too many arguments to function ?parse_distance? Bio/Cluster/clustermodule.c:1538: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1538: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c:1545: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1545: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c:1552: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1552: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c:1565: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1565: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c: In function ?py_treecluster?: Bio/Cluster/clustermodule.c:1706: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1706: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1707: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1708: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1726: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:1733: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:1739: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c:1762: error: ?aDISTANCEMATRIX? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1770: error: too many arguments to function ?parse_distance? Bio/Cluster/clustermodule.c:1783: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1783: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c: In function ?py_somcluster?: Bio/Cluster/clustermodule.c:1849: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1849: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1852: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1855: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1863: error: ?aCELLDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1865: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1922: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:1929: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:1935: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c:1944: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:1954: error: too many arguments to function ?create_celldata? Bio/Cluster/clustermodule.c: In function ?py_median?: Bio/Cluster/clustermodule.c:1996: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1996: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2007: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2015: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2018: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2019: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2021: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2037: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2043: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: In function ?py_mean?: Bio/Cluster/clustermodule.c:2062: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2062: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2073: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2081: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2084: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2085: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2087: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2103: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2109: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: In function ?py_clusterdistance?: Bio/Cluster/clustermodule.c:2167: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2167: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2170: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2173: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2181: error: ?aINDEX1? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2184: error: ?aINDEX2? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2216: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:2222: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:2228: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c:2235: error: too many arguments to function ?parse_index? Bio/Cluster/clustermodule.c:2242: error: too many arguments to function ?parse_index? Bio/Cluster/clustermodule.c: In function ?py_clustercentroids?: Bio/Cluster/clustermodule.c:2312: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2312: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2315: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2318: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2322: error: ?aCDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2324: error: ?aCMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2350: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:2356: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:2362: warning: passing argument 3 of ?parse_clusterid? makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2362: error: too many arguments to function ?parse_clusterid? Bio/Cluster/clustermodule.c:2371: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2384: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: In function ?py_distancematrix?: Bio/Cluster/clustermodule.c:2466: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2466: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2469: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2472: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2507: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:2514: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:2520: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c:2542: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2542: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2548: error: expected expression before ?)? token error: command 'gcc' failed with exit status 1 [tim at localhost biopython-1.43]$ [tim at localhost biopython-1.43]$ Thanks, Tim From alexl at users.sourceforge.net Wed Apr 11 10:44:29 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Wed, 11 Apr 2007 07:44:29 -0700 Subject: [BioPython] Biopython package for Fedora In-Reply-To: <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com> (Chris Lasher's message of "Wed\, 11 Apr 2007 00\:43\:14 -0400") References: <274pnrcj28.fsf@delpy.biol.berkeley.edu> <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com> Message-ID: >>>>> "CL" == Chris Lasher writes: [...] CL> Alex, thanks for pointing this out. I sat down tonight and CL> resolved this issue. CL> CL> The patch on there should be the fix needed. Save it as CL> setup_test.patch (or whatever, but that's convenient), place it in CL> the same directory as setup.py, and patch with the command CL> patch -p0 < setup_test.patch CL> Alternatively, I can send you the patched files (setup.py and CL> Tests/run_tests.py). CL> Thanks again for pointing this out. Hi Chris, Thanks, the patch works fine for me. I've added the patch to the package and I can now run the tests in command-line only mode fine. By the way, I've filed my package review for Fedora: https://bugzilla.redhat.com/235989 if anybody wants to keep track of it's progress. I am currently still disabling the tests because they hang for some reason on test_Cluster, I get: $ python setup.py test --no-gui running test test_Ace ... ok test_BioSQL ... Skipping test because of import error: Skipping BioSQL tests -- enable tests in Tests/test_BioSQL.py ok test_CAPS ... ok test_Cluster ... then the CPU spins indefinitely. Also I need to make sure that all tests that require network access are skipped cleanly because the package build environment for Fedora requires that all packages build without network acess. On another packaging note: I now remove all #!/usr/bin/ etc. from the top of files found in the /usr/lib/python2.4/site-packages/Bio/* area to keep rpmlint happy. These can still be run using python directly e.g.: python /usr/lib/python2.4/site-packages/Bio/biblio.py Note that there's a lot of inconsistency here: some are "/usr/bin/env python", others are /usr/bin/python or even /usr/bin/python2.3, others don't have a main program contained within, and so the #!/usr/bin line should be removed completely. Somebody should go through and cleanup/rationalise the installation process: check that the files installed when "python setup.py install" is run are appropriate .py package files, e.g. the EUtils installs it's own "setup.py" file in a subdirectory, which isn't very clean. Alex From mdehoon at c2b2.columbia.edu Wed Apr 11 11:44:30 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Wed, 11 Apr 2007 17:44:30 +0200 Subject: [BioPython] installing on Mandriva Linux In-Reply-To: <1176300953.3621.13.camel@localhost> References: <1176300953.3621.13.camel@localhost> Message-ID: <461D025E.9070107@c2b2.columbia.edu> tim wrote: >I get problems from this point onwards in the install, with lots of >Bio/Cluster/clustermodule errors: >... >creating build/temp.linux-i686-2.4/Bio/Cluster >gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe >-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fomit-frame-pointer -march=i586 >-mtune=pentiumpro -fasynchronous-unwind-tables -g -fPIC -IBio/Cluster >-I/usr/include/python2.4 -c Bio/Cluster/clustermodule.c -o >build/temp.linux-i686-2.4/Bio/Cluster/clustermodule.o >Bio/Cluster/clustermodule.c:2:33: error: Numeric/arrayobject.h: No such >file or directory > > This is the first error message that you get. Did you check that you have the header file arrayobject.h? And is it in the correct location? --Michiel From jhortia1 at jhu.edu Fri Apr 13 15:21:54 2007 From: jhortia1 at jhu.edu (JASON HORTIATIS) Date: Fri, 13 Apr 2007 15:21:54 -0400 Subject: [BioPython] Local Blast Output Message-ID: I'm an undergraduate using biopython to run local blast searches and I'm trying to find out how to save the entire sequence of each protein hit directly to a file. I have only managed to be able to print the portion of the sequence that matches the query using hsp.sbjct[0:]. My goal is to use the search results from one blast run as a database to search against for a subsequent run so a fasta file is needed for each hit of the first run. Thanks for the help! Jason From sbassi at gmail.com Sat Apr 14 00:14:20 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Sat, 14 Apr 2007 01:14:20 -0300 Subject: [BioPython] Local Blast Output In-Reply-To: References: Message-ID: On 4/13/07, JASON HORTIATIS wrote: > I'm an undergraduate using biopython to run local blast searches and I'm trying to find out how to save the entire sequence of each protein hit directly to a file. I have only managed to be able to print the portion of the sequence that matches the query using hsp.sbjct[0:]. My goal is to use the search results from one blast run as a database to search against for a subsequent run so a fasta file is needed for each hit of the first run. > Thanks for the help! You can only parse from the BLAST result what is inside the BLAST output. And there is no the whole sequence In such output, just the portion you've retrieved. You may need to parse the GID of the protein and then look for it in your BLAST DB (using fastacmd). Or you may use PSI-BLAST as an alternative. From elventear at gmail.com Tue Apr 17 13:52:40 2007 From: elventear at gmail.com (Pepe Barbe) Date: Tue, 17 Apr 2007 12:52:40 -0500 Subject: [BioPython] Martel Help Message-ID: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com> Hello, I am interested in using Martel for parsing some Biology formats (So far nothing new). While the module seems really good, I've been struggling to find some meaningful documentation. So far I feel I am walking in the dark. Still I've made some progress. If there is some tutorial or complete documentation out there I would appreciate if some would point me to it. My current question is the following. I have the impression that every single line that the Martel parser is going to parse must be recognized, and otherwise it will raise and Exception. Is this correct? If its true, how can I ignore anything that doesn't match a RegEx and just process what matches? Thanks, Pepe From elventear at gmail.com Wed Apr 18 12:54:30 2007 From: elventear at gmail.com (Pepe Barbe) Date: Wed, 18 Apr 2007 11:54:30 -0500 Subject: [BioPython] Martel Help In-Reply-To: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com> References: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com> Message-ID: <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com> Hello, I've been reading the meager information available for Martel and I have made good progress, I think. I am basically following the example in the Exelixis presentation. In the example, there are some things whose purpose is obvious but the implementation details (Or all the possible options) aren't. Currently I am curious on how does Martel.HeaderFooter and Std.record affect the parsing. Later in that example they use: blat.format.make_iterator("record"). Where does the "record" come from? Because of using Std.record? Any help would be deeply appreciated. Pepe From dalke at dalkescientific.com Wed Apr 18 17:45:00 2007 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 18 Apr 2007 23:45:00 +0200 Subject: [BioPython] Martel Help In-Reply-To: <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com> References: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com> <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com> Message-ID: On Apr 18, 2007, at 6:54 PM, Pepe Barbe wrote: > In the example, there are some things whose purpose is obvious but the > implementation details (Or all the possible options) aren't. Currently > I am curious on how does Martel.HeaderFooter and Std.record affect the > parsing. I'm having to think back several years now. A limitation with Martel is parsing large data files. It has a memory overhead of several times the data file being processed. Eg, a 1 MB file might take 7 or so MB to process. Most bioinformatics formats are composed of records. Eg, a GenBank file contains many GenBank records. The idea of the Header / Footer / HeaderFooter classes is to break the large file down into small records, and only have the overhead for parsing a record. (But it doesn't help processing large records, like the entire chromosome as a single FASTA record.) In FASTA files there is no header or footer. It can be read and split up using a RecordReader. Specifically with a StartsWith record reader told to look for a ">" which marks the start of a new record. Compare to SwissProt where the record ends with a "//" line. Some formats are more complicated. GenBank is one. Real genbank files start with a header, something like GBGSS1.SEQ Genetic Sequence Data Bank February 15 2003 NCBI-GenBank Flat File Release 134.0 GSS Sequences (Part 1) 88066 loci, 66600405 bases, from 88066 reported sequences There needs to be a way to process a single, unique header, followed by 0-or-more repeats of a record, followed by an optional footer. Use the HeaderFooter expression for this case. In general, this is a clumsy solution. Ignore the Std.record. My thought was that the different terms in the expression could be standardized. For example, that all sequences are tagged with "bio:seq". I hoped this would minimize the work needed to add a new format because most of the handlers would look for expected tags, and not depend so much on the actual structure of the XML. It proved too complicated to explain and use. > Later in that example they use: blat.format.make_iterator("record"). > Where does the "record" come from? Because of using Std.record? The "record" comes from a group name used in the expression. It describes the point where the repetition will be done. Andrew dalke at dalkescientific.com From skhadar at gmail.com Fri Apr 20 08:47:07 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Fri, 20 Apr 2007 18:17:07 +0530 Subject: [BioPython] Protparam using BioPythn Message-ID: Dear All, I am looking for a script to run Protparam for a 1000 sequence. It will be great if anyone can point me to a program / web page to get it done. Many thanks in advance, Shameer Khadar From biopython at maubp.freeserve.co.uk Fri Apr 20 09:51:54 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Apr 2007 14:51:54 +0100 Subject: [BioPython] Protparam using BioPython In-Reply-To: References: Message-ID: <4628C57A.7010803@maubp.freeserve.co.uk> Shameer Khadar wrote: > Dear All, > > I am looking for a script to run Protparam for a 1000 sequence. It will be > great if anyone can point me to a program / web page to get it done. Do you mean the Biopython module Bio.SeqUtils.ProtParam which does protein analysis (e.g. isoelectric point). Did you mean the Expasy ProtParam tool available online? If you only have a few sequences doing them online by hand would be easy: http://www.expasy.org/tools/protparam.html Or did you mean something else? Peter P.S. did you mean 1000 different sequences, or a single 1000 amino acid sequence? From skhadar at gmail.com Fri Apr 20 11:19:01 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Fri, 20 Apr 2007 20:49:01 +0530 Subject: [BioPython] Protparam using BioPython In-Reply-To: <4628C57A.7010803@maubp.freeserve.co.uk> References: <4628C57A.7010803@maubp.freeserve.co.uk> Message-ID: Dear Peter, Thanks for your reply. I was looking for a script based on Bio.SeqUtils. I got the following script from a website, its working perfect for me. But the problem is i have around 1000 sequence (in raw format without headers) and i thought to process it using a foreach equivalent in python(I am a python newbie). But its only a couple of minutes back i came to know that there is no foreach in python, but some better alternative is available !!!. It will be great if you can help to process my file using this program. program : from Bio.SeqUtils import ProtParam, ProtParamData def PrintDictionary(MyDict): for i in MyDict.keys(): print "%s\t%.2f" %(i, MyDict[i]) print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL" X = ProtParam.ProteinAnalysis("") print "Instability index of test protein: %.2f" % X.instability_index() first few lines of my file : AEGEFAHLYGTFRED AEGEFAHLZGTFRED AEGEFGATYGVYTSD AEGEFGATZGVYTSD AEGEFGATYGVZTSD AEGEFGATZGVZTSD AEGEFLYGEIQGTQD Thank you once again, Shameer On 4/20/07, Peter wrote: > > Shameer Khadar wrote: > > Dear All, > > > > I am looking for a script to run Protparam for a 1000 sequence. It will > be > > great if anyone can point me to a program / web page to get it done. > > Do you mean the Biopython module Bio.SeqUtils.ProtParam which does > protein analysis (e.g. isoelectric point). > > Did you mean the Expasy ProtParam tool available online? If you only > have a few sequences doing them online by hand would be easy: > http://www.expasy.org/tools/protparam.html > > Or did you mean something else? > > Peter > > P.S. did you mean 1000 different sequences, or a single 1000 amino acid > sequence? > > From alexl at users.sourceforge.net Wed Apr 25 04:22:44 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Wed, 25 Apr 2007 01:22:44 -0700 Subject: [BioPython] Bioperl packages now available for Fedora Message-ID: <3kzm4w50dn.fsf@delpy.biol.berkeley.edu> Hi all, Fedora packages for Biopython are now available in the official Fedora repositories. Packages for Fedora Core 6 (FC-6) and Rawhide (the soon-to-be Fedora 7) are available immediately and are installable via the simple yum command: # sudo yum install python-biopython and through any other GUI based installers available for Fedora, such as piruit, smart or yumex. The name of the package is python-biopython. (A package for Fedora Core 5 has been built and should be in the FC-5 repository within the next 24 hours or so). These packages have all optional packages enabled by default: MySQL-python, python-reportlab and Wise2. Please file bugs on these packages in Red Hat/Fedora bugzilla under "Fedora Extras": https://bugzilla.redhat.com/bugzilla/ please choose your release and select the "python-biopython" component. If somebody could update the wiki page with this information, that would be great: http://biopython.org/wiki/Download Alex From biopython at maubp.freeserve.co.uk Fri Apr 27 05:55:42 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Apr 2007 10:55:42 +0100 Subject: [BioPython] Protparam using BioPython In-Reply-To: <4628C57A.7010803@maubp.freeserve.co.uk> References: <4628C57A.7010803@maubp.freeserve.co.uk> Message-ID: <4631C89E.3090208@maubp.freeserve.co.uk> Shameer Khadar wrote: > Dear Peter, > > Thanks for your reply. Sorry for the delay - I was away on a course this week. > I was looking for a script based on Bio.SeqUtils. > I got the following script from a website, its working perfect for me. But > the problem is i have around 1000 sequence (in raw format without headers) > and i thought to process it using a foreach equivalent in python(I am a > python newbie). But its only a couple of minutes back i came to know that > there is no foreach in python, but some better alternative is available > !!!. There is a "for each" equivalent in python! http://docs.python.org/tut/node6.html If you don't have a good introductory python book, that online tutorial is an excellent starting point. > It will be great if you can help to process my file using this > program. > > program : > from Bio.SeqUtils import ProtParam, ProtParamData > def PrintDictionary(MyDict): > for i in MyDict.keys(): > print "%s\t%.2f" %(i, MyDict[i]) > print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL" > X = ProtParam.ProteinAnalysis("") > print "Instability index of test protein: %.2f" % X.instability_index() It seems like you have only given bits of a program, so I have tried to guess what you meant. > first few lines of my file : > AEGEFAHLYGTFRED > AEGEFAHLZGTFRED > AEGEFGATYGVYTSD > AEGEFGATZGVYTSD > AEGEFGATYGVZTSD > AEGEFGATZGVZTSD > AEGEFLYGEIQGTQD In the following example, I am assuming your sequences are in a plain text file, called protparam.txt, which contains each sequence on a single line. Try something like this first of all, and make sure that it prints out your sequences correctly: for line in open("protparam.txt") : #Remove any trailing new lines or white space seq_string = line.rstrip() print "Sequence <%s>" % seq_string Then try doing the ProtParam.ProteinAnalysis of each sequence string: from Bio.SeqUtils import ProtParam, ProtParamData for line in open("protparam.txt") : #Remove any trailing new lines or white space seq_string = line.rstrip() print "Sequence <%s>" % seq_string X = ProtParam.ProteinAnalysis(seq_string) print "Instability index: %.2f" % X.instability_index() You'll find it doesn't like the "Z" (presumably this is Glx - glutamic acid or glutamine? i.e. E or Q) present in many of your sequences, so this next version uses error handling to note this and then carry on to the next sequence: from Bio.SeqUtils import ProtParam, ProtParamData for line in open("protparam.txt") : #Remove any trailing new lines or white space seq_string = line.rstrip() print #blank line print "Sequence <%s>" % seq_string X = ProtParam.ProteinAnalysis(seq_string) try : print "Instability index: %.2f" % X.instability_index() except KeyError, e : print "Problem with the letter %s in the sequence?" % str(e) The output is: Sequence Instability index: 8.39 Sequence Problem with the letter 'Z' in the sequence? Sequence Instability index: -17.70 Sequence Problem with the letter 'Z' in the sequence? Sequence Problem with the letter 'Z' in the sequence? Sequence Problem with the letter 'Z' in the sequence? Sequence Instability index: 8.61 You'll have to check yourself to see if these numbers are sensible. I don't know what to suggest for your "Z" entries - the stability will be different if you try using E or Q instead. Peter From biopython at maubp.freeserve.co.uk Sat Apr 28 04:58:40 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Apr 2007 09:58:40 +0100 Subject: [BioPython] EMBL parsing in Biopython 1.43 Message-ID: <46330CC0.9060708@maubp.freeserve.co.uk> As part of the new SeqIO system introduced in Biopython 1.43, I added the ability to read in EMBL format sequences. http://biopython.org/wiki/SeqIO I would be interested to hear feedback (positive or negative) from anyone who has tried to use this. Peter From alexl at users.sourceforge.net Sat Apr 28 06:21:40 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Sat, 28 Apr 2007 03:21:40 -0700 Subject: [BioPython] Somebody vandalised the wiki download page Message-ID: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu> I just created an account and fixed it with this edit: http://biopython.org/w/index.php?title=Download&diff=1868&oldid=1867 Can somebody with sufficient admin privileges block user "Uzman"? Thanks, Alex From cjfields at uiuc.edu Sat Apr 28 09:53:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 28 Apr 2007 08:53:37 -0500 Subject: [BioPython] Somebody vandalised the wiki download page In-Reply-To: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu> References: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu> Message-ID: Done. chris On Apr 28, 2007, at 5:21 AM, Alex Lancaster wrote: > I just created an account and fixed it with this edit: > > http://biopython.org/w/index.php?title=Download&diff=1868&oldid=1867 > > Can somebody with sufficient admin privileges block user "Uzman"? > > Thanks, > Alex > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mdehoon at c2b2.columbia.edu Sun Apr 29 06:16:31 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Apr 2007 19:16:31 +0900 Subject: [BioPython] EMBL parsing in Biopython 1.43 In-Reply-To: <46330CC0.9060708@maubp.freeserve.co.uk> References: <46330CC0.9060708@maubp.freeserve.co.uk> Message-ID: <4634707F.5060607@c2b2.columbia.edu> Thanks Peter! I tried this EMBL-formatted file (using the latest version of Biopython in CVS): ftp://ftp.pasteur.fr/pub/GenomeDB/SubtiList/FlatFiles/SLR16.1_embl.txt but I got this error message: >>> from Bio import SeqIO >>> input = open("SLR16.1_embl.txt") >>> records = SeqIO.parse(input, format="embl") >>> records.next() Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 410, in parse_records record = self.parse(handle) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 393, in parse if self.feed(handle, consumer) : File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 360, in feed self._feed_first_line(consumer, self.line) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 540, in _feed_first_line assert len(fields) == 7 AssertionError >>> Do you have an idea as to what may be going wrong here? --Michiel. Peter wrote: > As part of the new SeqIO system introduced in Biopython 1.43, I added > the ability to read in EMBL format sequences. > > http://biopython.org/wiki/SeqIO > > I would be interested to hear feedback (positive or negative) from > anyone who has tried to use this. > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Sun Apr 29 16:02:05 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 29 Apr 2007 21:02:05 +0100 Subject: [BioPython] EMBL parsing in Biopython 1.43 In-Reply-To: <4634707F.5060607@c2b2.columbia.edu> References: <46330CC0.9060708@maubp.freeserve.co.uk> <4634707F.5060607@c2b2.columbia.edu> Message-ID: <4634F9BD.8070909@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Thanks Peter! > > I tried this EMBL-formatted file (using the latest version of Biopython > in CVS): > > ftp://ftp.pasteur.fr/pub/GenomeDB/SubtiList/FlatFiles/SLR16.1_embl.txt > > but I got this error message: > > >>> from Bio import SeqIO > >>> input = open("SLR16.1_embl.txt") > >>> records = SeqIO.parse(input, format="embl") > >>> records.next() > Traceback (most recent call last): ... > "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", > line 540, in _feed_first_line > assert len(fields) == 7 > AssertionError > >>> Does the same here on with CVS Biopython on Linux with python 2.4 > Do you have an idea as to what may be going wrong here? Yes - I wrote and EMBL parser using the latest file format, while I suspect your file from the Pasteur Institute uses an older format - specifically one where the first list (the ID line) has a different number of fields. This is reminiscent of the various revisions to the GenBank LOCUS line which we also have to cope with. I hope to have a fix in CVS today/tomorrow. Peter From biopython at maubp.freeserve.co.uk Sun Apr 29 18:11:07 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 29 Apr 2007 23:11:07 +0100 Subject: [BioPython] EMBL parsing in Biopython 1.43 In-Reply-To: <4634F9BD.8070909@maubp.freeserve.co.uk> References: <46330CC0.9060708@maubp.freeserve.co.uk> <4634707F.5060607@c2b2.columbia.edu> <4634F9BD.8070909@maubp.freeserve.co.uk> Message-ID: <463517FB.9090706@maubp.freeserve.co.uk> Peter wrote: > Michiel de Hoon wrote: >> Do you have an idea as to what may be going wrong here? > > Yes - I wrote and EMBL parser using the latest file format, while I > suspect your file from the Pasteur Institute uses an older format - > specifically one where the first list (the ID line) has a different > number of fields. The file you tried seems to use the pre 2006 style ID line. I found another example like this on the BioPerl webpage. See also: http://www.ebi.ac.uk/embl/Documentation/archivedchanges.html > I hope to have a fix in CVS today/tomorrow. I have updated Bio/GenBank/Scanner.py to cope with these old EMBL ID lines and added another EMBL test case to test_SeqIO.py Your example now parses fine, giving a single SeqRecord as expected. I have not checked the annotation or features... Peter From skhadar at gmail.com Mon Apr 30 09:01:56 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Mon, 30 Apr 2007 18:31:56 +0530 Subject: [BioPython] Protparam using BioPython In-Reply-To: <4631C89E.3090208@maubp.freeserve.co.uk> References: <4628C57A.7010803@maubp.freeserve.co.uk> <4631C89E.3090208@maubp.freeserve.co.uk> Message-ID: Dear Peter, Thanks a lot for you detailed reply and splendid help !!! It worked !! Cheers, Shameer On 4/27/07, Peter wrote: > > Shameer Khadar wrote: > > Dear Peter, > > > > Thanks for your reply. > > Sorry for the delay - I was away on a course this week. > > > I was looking for a script based on Bio.SeqUtils. > > I got the following script from a website, its working perfect for me. > But > > the problem is i have around 1000 sequence (in raw format without > headers) > > and i thought to process it using a foreach equivalent in python(I am a > > python newbie). But its only a couple of minutes back i came to know > that > > there is no foreach in python, but some better alternative is available > > !!!. > > There is a "for each" equivalent in python! > http://docs.python.org/tut/node6.html > > If you don't have a good introductory python book, that online tutorial > is an excellent starting point. > > > It will be great if you can help to process my file using this > > program. > > > > program : > > from Bio.SeqUtils import ProtParam, ProtParamData > > def PrintDictionary(MyDict): > > for i in MyDict.keys(): > > print "%s\t%.2f" %(i, MyDict[i]) > > print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL" > > X = ProtParam.ProteinAnalysis("") > > print "Instability index of test protein: %.2f" % X.instability_index() > > It seems like you have only given bits of a program, so I have tried to > guess what you meant. > > > first few lines of my file : > > AEGEFAHLYGTFRED > > AEGEFAHLZGTFRED > > AEGEFGATYGVYTSD > > AEGEFGATZGVYTSD > > AEGEFGATYGVZTSD > > AEGEFGATZGVZTSD > > AEGEFLYGEIQGTQD > > In the following example, I am assuming your sequences are in a plain > text file, called protparam.txt, which contains each sequence on a > single line. > > Try something like this first of all, and make sure that it prints out > your sequences correctly: > > for line in open("protparam.txt") : > #Remove any trailing new lines or white space > seq_string = line.rstrip() > print "Sequence <%s>" % seq_string > > Then try doing the ProtParam.ProteinAnalysis of each sequence string: > > from Bio.SeqUtils import ProtParam, ProtParamData > for line in open("protparam.txt") : > #Remove any trailing new lines or white space > seq_string = line.rstrip() > print "Sequence <%s>" % seq_string > X = ProtParam.ProteinAnalysis(seq_string) > print "Instability index: %.2f" % X.instability_index() > > You'll find it doesn't like the "Z" (presumably this is Glx - glutamic > acid or glutamine? i.e. E or Q) present in many of your sequences, so > this next version uses error handling to note this and then carry on to > the next sequence: > > from Bio.SeqUtils import ProtParam, ProtParamData > for line in open("protparam.txt") : > #Remove any trailing new lines or white space > seq_string = line.rstrip() > > print #blank line > print "Sequence <%s>" % seq_string > X = ProtParam.ProteinAnalysis(seq_string) > try : > print "Instability index: %.2f" % X.instability_index() > except KeyError, e : > print "Problem with the letter %s in the sequence?" % str(e) > > The output is: > > Sequence > Instability index: 8.39 > > Sequence > Problem with the letter 'Z' in the sequence? > > Sequence > Instability index: -17.70 > > Sequence > Problem with the letter 'Z' in the sequence? > > Sequence > Problem with the letter 'Z' in the sequence? > > Sequence > Problem with the letter 'Z' in the sequence? > > Sequence > Instability index: 8.61 > > You'll have to check yourself to see if these numbers are sensible. I > don't know what to suggest for your "Z" entries - the stability will be > different if you try using E or Q instead. > > Peter > > From jhortia1 at jhu.edu Mon Apr 30 16:16:42 2007 From: jhortia1 at jhu.edu (JASON HORTIATIS) Date: Mon, 30 Apr 2007 16:16:42 -0400 Subject: [BioPython] local blast output Message-ID: Dear all, I'm a novice using biopython to run local blast searches and save the output to a file, but i've run into a problem becuase it seems as though the b_parser has a limit of 250 sequences, however my searches are returning far more than 250 sequences. Does anyone know if the parser really is limited, and if so if it is possible to work around this? Thanks for the help, Jason From sbassi at gmail.com Mon Apr 30 17:26:50 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 30 Apr 2007 18:26:50 -0300 Subject: [BioPython] local blast output In-Reply-To: References: Message-ID: On 4/30/07, JASON HORTIATIS wrote: > Dear all, > I'm a novice using biopython to run local blast searches and save the output to a file, but i've run into a problem becuase it seems as though the b_parser has a limit of 250 sequences, however my searches are returning far more than 250 sequences. Does anyone know if the parser really is limited, and if so if it is possible to work around this? > Thanks for the help, There is no 250 limit in the parser. Please show us to code to help you. Also tell us blast and biopython version. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From nicolas.chauvat at logilab.fr Mon Apr 2 15:58:49 2007 From: nicolas.chauvat at logilab.fr (Nicolas Chauvat) Date: Mon, 2 Apr 2007 17:58:49 +0200 Subject: [BioPython] [ANN] EuroPython 2007: Call for Proposals Message-ID: <20070402155849.GF24884@crater.logilab.fr> Book Monday 9th July to Wednesday 11th July 2007 in your calendar! EuroPython 2007, the European Python and Zope Conference, will be held in Vilnius, Lithuania. Last year's conference was a great success, featuring a variety of tracks, amazing lightning talks and inspiring keynotes. With your participation, we want to make EuroPython 2007, the sixth EuroPython, even more successful than the previous five. Talks, Papers and Themes ------------------------ This year we have decided to borrow a few good ideas from PyCon, one of which is to move away from the 'track' structure. Instead, speakers are invited to submit presentations about anything they have done that they think would be of interest to the Python community. We will then arrange them into related groups and schedule them in the space available. In the past, EuroPython participants have found the following themes to be of interest: * Science * Python Language and Libraries * Web Related Technologies * Education * Games * Agile Methodologies and Testing * Social Skills In addition to talks, we will also accept full paper submissions about any of the above themes. The Call for Refereed Papers will be posted shortly. The deadline for talk proposals is Friday 18th May at midnight (24:00 CEST, Central European Summer Time, UTC+2). Other ways to participate ------------------------- Apart from giving talks, there are plenty of other ways to participate in the conference. Just attending and talking to people you find here can be satisfying enough, but there are three other kinds of activity you may wish to plan for: Lightning Talks, Open Space and Sprints. Lightning Talks are very short talks that give you just enough time to introduce a topic or project, Open Space is an area reserved for informal discussions, and Sprints are focused gatherings for developers interested in particular projects. For more information please see the following pages: * Lightning Talks: http://www.europython.org/sections/events/lightning_talks * Open Space: http://www.europython.org/sections/events/open_space * Sprints: http://www.europython.org/sections/sprints_and_wiki Your Contribution ----------------- To propose a talk or a paper, go to... * http://www.europython.org/submit For more general information on the conference, please visit... * http://www.europython.org/ Looking forward to seeing what you fine folk have been up to, The EuroPython Team -- Nicolas Chauvat logilab.fr - services en informatique avanc?e et gestion de connaissances From alexl at users.sourceforge.net Sun Apr 8 09:27:27 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Sun, 08 Apr 2007 02:27:27 -0700 Subject: [BioPython] Biopython package for Fedora Message-ID: <274pnrcj28.fsf@delpy.biol.berkeley.edu> (Apologies if you receive multiple copies, this is a repost, my original bounced) Hello Biopythonistas, I have created preliminary RPM package of the latest release of Biopython (1.43) for Fedora as part of the "Fedora Package Collection" (formerly "Fedora Extras" since Fedora Core+Fedora Extras are merging). (I am also packaging Bioperl, you can see my some of my progress including links to the reviews here: http://fedoraproject.org/wiki/AlexLancaster) I am almost ready to submit my package for review, but several issues have arisen during the packaging that I hope the biopython list can help clarify before I do so: 1) Will Biopython work OK with Python 2.5? I ask because the next release of Fedora (Fedora 7) will only ship with Python 2.5 and packages first need to build in the development branch (which will eventually become Fedora 7) first. 2) The "python setup.py install" step appears to install a lot of scripts with the "#!/usr/bin/env python" at the top into the main /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.: /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py should these scripts be installed somewhere more appropriate such as /usr/bin/GenericTools.py or do they also function as classes as well as executables in their own right? The "rpmlint" tool which is part of the packaging scans a package built for Fedora and identifies certain aspects of the package as not following the package and/or file system hierarchy (FHS) guidelines. [1] 3) The setup.py install also installs some architecture-independent non-code data files (such as DTDs) which I would normally expect to live in /usr/share/python-biopython/DTDs (or somesuch) for example: /usr/lib/python2.4/site-packages/Bio/EUtils/DTDs/eSearch_020511.dtd Is this the normal location for these DTDs and does the rest of the bipython framework expect to find these files in this location? 4) If possible, Fedora packages should run all unit tests provided in the upstream package at package time, just before creating the RPM. I would like to do this for biopython as well, but there doesn't seem to be an easy way to disable the PyUnit GUI that pops up and run in batch-only non-GUI mode. I looked at the code in Tests/run_tests.py and it does have a "--no-gui" option, but there does not appear to be any way to run this from the top-level setup.py file, e.g.: python setup.py test --no-gui doesn't work. 5) My initial package depends on the required software: python, mx, python-numeric, as well as the optional python-reportlab, MySQL-python and flex which are all also included in Fedora, but I won't have Wise2 available since it is not yet in Fedora, at least not until I (or somebody else) packages Wise2. 6) Is Biopython-corba still active, and if so, should it also be packaged? Are there any interdependencies with the base biopython package? (No promises, though!) Thanks, Alex [1] I attempted to attach the list at the end of the e-mail for the developers to identify and tell me if these files are OK where the setup.py currently puts them, but my original e-mail bounced probably because of the attachment. -- Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona From sbassi at gmail.com Sun Apr 8 19:12:23 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Sun, 8 Apr 2007 16:12:23 -0300 Subject: [BioPython] Biopython package for Fedora In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu> References: <274pnrcj28.fsf@delpy.biol.berkeley.edu> Message-ID: On 4/8/07, Alex Lancaster wrote: > 1) Will Biopython work OK with Python 2.5? I ask because the next > release of Fedora (Fedora 7) will only ship with Python 2.5 and > packages first need to build in the development branch (which will > eventually become Fedora 7) first. This is the only question I am able to answer. Yes, it does work with Python 2.5. From chris.lasher at gmail.com Sun Apr 8 20:14:54 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 8 Apr 2007 16:14:54 -0400 Subject: [BioPython] Biopython package for Fedora In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu> References: <274pnrcj28.fsf@delpy.biol.berkeley.edu> Message-ID: <128a885f0704081314r490b7fbdj71d8b16612e8b54c@mail.gmail.com> On 4/8/07, Alex Lancaster wrote: > 2) The "python setup.py install" step appears to install a lot of > scripts with the "#!/usr/bin/env python" at the top into the main > /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.: > > /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py > > should these scripts be installed somewhere more appropriate such > as /usr/bin/GenericTools.py or do they also function as classes as > well as executables in their own right? The line #!/usr/bin/env python retrieves the appropriate Python installation as specified by the user's defined environment. This is preferable to hard-coding #!/usr/bin/python, which will always use the Python installation pointed to by /usr/bin/python. For most users, this doesn't matter, but if the user desires to use a local or custom installation of Python, they must change all these scripts by hand to point to their preferred Python install. Say my distribution's Python is version 2.3 but I have installed a local copy of version 2.5 which is symlinked at /usr/local/bin/python. I can set /usr/local/bin/python ahead in my path and the scripts with "#!/usr/bin/env python" will then execute with my preferred version (2.5) of Python rather than the system version (2.3), but the scripts with "#!/usr/bin/python" will execute with the system version (2.3) rather than my prefered version (2.5). Web search for more details. Chris From alexl at users.sourceforge.net Sun Apr 8 22:28:06 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Sun, 08 Apr 2007 15:28:06 -0700 Subject: [BioPython] Biopython package for Fedora Message-ID: >>>>> "CL" == Chris Lasher writes: CL> On 4/8/07, Alex Lancaster wrote: >> 2) The "python setup.py install" step appears to install a lot of >> scripts with the "#!/usr/bin/env python" at the top into the main >> /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.: >> >> /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py >> >> should these scripts be installed somewhere more appropriate such >> as /usr/bin/GenericTools.py or do they also function as classes as >> well as executables in their own right? CL> The line #!/usr/bin/env python CL> retrieves the appropriate Python installation as specified by the CL> user's defined environment. [...] I'm aware of the function of the "/usr/bin/env python" vs. "/usr/bin/python", that isn't the problem. My question was about the *location* of the script files when installed in /usr/lib/python2.4/site-packages/Bio/* vs. being installed as executables in /usr/bin/. It seems that there are a number of files which contain both classes and scripts and rpmlint identifies all files containing scripts which aren't installed in a location like /usr/bin/ to make sure that scripts aren't unintentionally installed in a non-executable location. Alex From alexl at users.sourceforge.net Sun Apr 8 04:51:04 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Sat, 07 Apr 2007 21:51:04 -0700 Subject: [BioPython] Biopython package(s) for Fedora Message-ID: Hello Biopythonistas, I have created preliminary RPM package of the latest release of Biopython (1.43) for Fedora as part of the "Fedora Package Collection" (formerly "Fedora Extras" since Fedora Core+Fedora Extras are merging). (I am also packaging Bioperl, you can see my some of my progress including links to the reviews here: http://fedoraproject.org/wiki/AlexLancaster) I am almost ready to submit my package for review, but several issues have arisen during the packaging that I hope the biopython list can help clarify before I do so: 1) Will Biopython work OK with Python 2.5? I ask because the next release of Fedora (Fedora 7) will only ship with Python 2.5 and packages first need to build in the development branch (which will eventually become Fedora 7) first. 2) The "python setup.py install" step appears to install a lot of scripts with the "#!/usr/bin/env python" at the top into the main /usr/lib/python2.4/site-packages/Bio/ namespace, e.g.: /usr/lib/python2.4/site-packages/Bio/GFF/GenericTools.py should these scripts be installed somewhere more appropriate such as /usr/bin/GenericTools.py or do they also function as classes as well as executables in their own right? The "rpmlint" tool which is part of the packaging scans a package built for Fedora and identifies certain aspects of the package as not following the package and/or file system hierarchy (FHS) guidelines and I attach the list at the end of the e-mail for the developers to identify and tell me if these files are OK where the setup.py currently puts them. 3) The setup.py install also installs some architecture-independent non-code data files (such as DTDs) which I would normally expect to live in /usr/share/python-biopython/DTDs (or somesuch) for example: /usr/lib/python2.4/site-packages/Bio/EUtils/DTDs/eSearch_020511.dtd Is this the normal location for these DTDs and does the rest of the bipython framework expect to find these files in this location? 4) If possible, Fedora packages should run all unit tests provided in the upstream package at package time, just before creating the RPM. I would like to do this for biopython as well, but there doesn't seem to be an easy way to disable the PyUnit GUI that pops up and run in batch-only non-GUI mode. I looked at the code in Tests/run_tests.py and it does have a "--no-gui" option, but there does not appear to be any way to run this from the top-level setup.py file, e.g.: python setup.py test --no-gui doesn't work. 5) My initial package depends on the required software: python, mx, python-numeric, as well as the optional python-reportlab, MySQL-python and flex which are all also included in Fedora, but I won't have Wise2 available since it is not yet in Fedora, at least not until I (or somebody else) packages Wise2. 6) Is Biopython-corba still active, and if so, should it also be packaged? Are there any interdependencies with the base biopython package? (No promises, though!) Thanks, Alex -- Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of Arizona -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: biopython-rpmlint.txt URL: From chris.lasher at gmail.com Wed Apr 11 04:43:14 2007 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 11 Apr 2007 00:43:14 -0400 Subject: [BioPython] Biopython package for Fedora In-Reply-To: <274pnrcj28.fsf@delpy.biol.berkeley.edu> References: <274pnrcj28.fsf@delpy.biol.berkeley.edu> Message-ID: <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com> On 4/8/07, Alex Lancaster wrote: > 4) If possible, Fedora packages should run all unit tests provided in > the upstream package at package time, just before creating the RPM. > I would like to do this for biopython as well, but there doesn't > seem to be an easy way to disable the PyUnit GUI that pops up and > run in batch-only non-GUI mode. I looked at the code in > Tests/run_tests.py and it does have a "--no-gui" option, but there > does not appear to be any way to run this from the top-level > setup.py file, e.g.: > > python setup.py test --no-gui > > doesn't work. Alex, thanks for pointing this out. I sat down tonight and resolved this issue. The patch on there should be the fix needed. Save it as setup_test.patch (or whatever, but that's convenient), place it in the same directory as setup.py, and patch with the command patch -p0 < setup_test.patch Alternatively, I can send you the patched files (setup.py and Tests/run_tests.py). Thanks again for pointing this out. Chris From timmcilveen at talktalk.net Wed Apr 11 14:15:52 2007 From: timmcilveen at talktalk.net (tim) Date: Wed, 11 Apr 2007 15:15:52 +0100 Subject: [BioPython] installing on Mandriva Linux Message-ID: <1176300953.3621.13.camel@localhost> Hi, I am getting lots of errors during python setup using biopython setup.py install. I am running python 2.4.3. on Linux and have mxtextools, numeric and headers etc. installed. The installation is definately not working as i get errors when i type some of the test code such as: from Bio.Seq import Seq I get a traceback error. Can anyone help. I'm new to biopython and Linux. I have everything working fine under Windows. I get problems from this point onwards in the install, with lots of Bio/Cluster/clustermodule errors: Do you want to continue this installation? (Y/n) Y *** Bio.KDTree *** NOT built by default The Bio.PDB.NeighborSearch module depends on the Bio.KDTree module, which in turn, depends on C++ code that does not compile cleanly on all platforms. Hence, Bio.KDTree is not built by default. Would you like to build Bio.KDTree ? (y/N) y creating build/temp.linux-i686-2.4/Bio/Cluster gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fomit-frame-pointer -march=i586 -mtune=pentiumpro -fasynchronous-unwind-tables -g -fPIC -IBio/Cluster -I/usr/include/python2.4 -c Bio/Cluster/clustermodule.c -o build/temp.linux-i686-2.4/Bio/Cluster/clustermodule.o Bio/Cluster/clustermodule.c:2:33: error: Numeric/arrayobject.h: No such file or directory Bio/Cluster/clustermodule.c:20: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_data?: Bio/Cluster/clustermodule.c:27: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:27: error: (Each undeclared identifier is reported only once Bio/Cluster/clustermodule.c:27: error: for each function it appears in.) Bio/Cluster/clustermodule.c:27: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:27: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:35: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:44: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:45: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:84: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c:98: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_mask?: Bio/Cluster/clustermodule.c:109: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:113: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:113: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:121: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:128: error: ?PyArray_INT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:130: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:178: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c:191: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_weight?: Bio/Cluster/clustermodule.c:197: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:201: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:201: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:209: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:210: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:212: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:255: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c:265: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token Bio/Cluster/clustermodule.c:372: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_clusterid?: Bio/Cluster/clustermodule.c:383: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:389: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:389: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:397: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:399: error: ?PyArray_INT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:401: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:471: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c:482: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?free_distances?: Bio/Cluster/clustermodule.c:485: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:489: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:489: error: ?a? undeclared (first use in this function) Bio/Cluster/clustermodule.c:489: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:515: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_distance?: Bio/Cluster/clustermodule.c:522: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:522: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:522: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:545: error: ?a? undeclared (first use in this function) Bio/Cluster/clustermodule.c:545: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:557: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:576: warning: assignment makes pointer from integer without a cast Bio/Cluster/clustermodule.c:584: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:601: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:628: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:628: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c:637: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:640: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:716: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?create_celldata?: Bio/Cluster/clustermodule.c:725: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:725: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:725: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:753: error: expected declaration specifiers or ?...? before ?PyArrayObject? Bio/Cluster/clustermodule.c: In function ?parse_index?: Bio/Cluster/clustermodule.c:757: error: ?array? undeclared (first use in this function) Bio/Cluster/clustermodule.c:766: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:766: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:776: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:778: error: ?PyArray_INT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:780: warning: assignment makes pointer from integer without a cast Bio/Cluster/clustermodule.c:787: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:803: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: At top level: Bio/Cluster/clustermodule.c:818: error: expected ?)? before ?*? token Bio/Cluster/clustermodule.c: In function ?PyTree_cut?: Bio/Cluster/clustermodule.c:1165: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1165: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1165: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:1181: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:1187: error: ?clusterid? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1197: warning: return makes pointer from integer without a cast Bio/Cluster/clustermodule.c: In function ?py_kcluster?: Bio/Cluster/clustermodule.c:1312: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1312: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1315: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1318: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1325: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1379: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:1384: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:1416: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c: In function ?py_kmedoids?: Bio/Cluster/clustermodule.c:1501: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1501: error: ?aDISTANCES? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1504: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1533: error: too many arguments to function ?parse_distance? Bio/Cluster/clustermodule.c:1538: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1538: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c:1545: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1545: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c:1552: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1552: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c:1565: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1565: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c: In function ?py_treecluster?: Bio/Cluster/clustermodule.c:1706: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1706: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1707: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1708: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1726: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:1733: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:1739: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c:1762: error: ?aDISTANCEMATRIX? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1770: error: too many arguments to function ?parse_distance? Bio/Cluster/clustermodule.c:1783: warning: passing argument 3 of ?free_distances? makes integer from pointer without a cast Bio/Cluster/clustermodule.c:1783: error: too many arguments to function ?free_distances? Bio/Cluster/clustermodule.c: In function ?py_somcluster?: Bio/Cluster/clustermodule.c:1849: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1849: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1852: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1855: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1863: error: ?aCELLDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1865: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1922: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:1929: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:1935: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c:1944: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:1954: error: too many arguments to function ?create_celldata? Bio/Cluster/clustermodule.c: In function ?py_median?: Bio/Cluster/clustermodule.c:1996: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:1996: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2007: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2015: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2018: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2019: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2021: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2037: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2043: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: In function ?py_mean?: Bio/Cluster/clustermodule.c:2062: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2062: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2073: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2081: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2084: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2085: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2087: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2103: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2109: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: In function ?py_clusterdistance?: Bio/Cluster/clustermodule.c:2167: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2167: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2170: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2173: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2181: error: ?aINDEX1? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2184: error: ?aINDEX2? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2216: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:2222: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:2228: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c:2235: error: too many arguments to function ?parse_index? Bio/Cluster/clustermodule.c:2242: error: too many arguments to function ?parse_index? Bio/Cluster/clustermodule.c: In function ?py_clustercentroids?: Bio/Cluster/clustermodule.c:2312: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2312: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2315: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2318: error: ?aCLUSTERID? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2322: error: ?aCDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2324: error: ?aCMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2350: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:2356: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:2362: warning: passing argument 3 of ?parse_clusterid? makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2362: error: too many arguments to function ?parse_clusterid? Bio/Cluster/clustermodule.c:2371: error: expected expression before ?)? token Bio/Cluster/clustermodule.c:2384: error: expected expression before ?)? token Bio/Cluster/clustermodule.c: In function ?py_distancematrix?: Bio/Cluster/clustermodule.c:2466: error: ?PyArrayObject? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2466: error: ?aDATA? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2469: error: ?aMASK? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2472: error: ?aWEIGHT? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2507: error: too many arguments to function ?parse_data? Bio/Cluster/clustermodule.c:2514: error: too many arguments to function ?parse_mask? Bio/Cluster/clustermodule.c:2520: error: too many arguments to function ?parse_weight? Bio/Cluster/clustermodule.c:2542: error: ?PyArray_DOUBLE? undeclared (first use in this function) Bio/Cluster/clustermodule.c:2542: warning: initialization makes pointer from integer without a cast Bio/Cluster/clustermodule.c:2548: error: expected expression before ?)? token error: command 'gcc' failed with exit status 1 [tim at localhost biopython-1.43]$ [tim at localhost biopython-1.43]$ Thanks, Tim From alexl at users.sourceforge.net Wed Apr 11 14:44:29 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Wed, 11 Apr 2007 07:44:29 -0700 Subject: [BioPython] Biopython package for Fedora In-Reply-To: <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com> (Chris Lasher's message of "Wed\, 11 Apr 2007 00\:43\:14 -0400") References: <274pnrcj28.fsf@delpy.biol.berkeley.edu> <128a885f0704102143j3697fe93keb6eb557da63e4fc@mail.gmail.com> Message-ID: >>>>> "CL" == Chris Lasher writes: [...] CL> Alex, thanks for pointing this out. I sat down tonight and CL> resolved this issue. CL> CL> The patch on there should be the fix needed. Save it as CL> setup_test.patch (or whatever, but that's convenient), place it in CL> the same directory as setup.py, and patch with the command CL> patch -p0 < setup_test.patch CL> Alternatively, I can send you the patched files (setup.py and CL> Tests/run_tests.py). CL> Thanks again for pointing this out. Hi Chris, Thanks, the patch works fine for me. I've added the patch to the package and I can now run the tests in command-line only mode fine. By the way, I've filed my package review for Fedora: https://bugzilla.redhat.com/235989 if anybody wants to keep track of it's progress. I am currently still disabling the tests because they hang for some reason on test_Cluster, I get: $ python setup.py test --no-gui running test test_Ace ... ok test_BioSQL ... Skipping test because of import error: Skipping BioSQL tests -- enable tests in Tests/test_BioSQL.py ok test_CAPS ... ok test_Cluster ... then the CPU spins indefinitely. Also I need to make sure that all tests that require network access are skipped cleanly because the package build environment for Fedora requires that all packages build without network acess. On another packaging note: I now remove all #!/usr/bin/ etc. from the top of files found in the /usr/lib/python2.4/site-packages/Bio/* area to keep rpmlint happy. These can still be run using python directly e.g.: python /usr/lib/python2.4/site-packages/Bio/biblio.py Note that there's a lot of inconsistency here: some are "/usr/bin/env python", others are /usr/bin/python or even /usr/bin/python2.3, others don't have a main program contained within, and so the #!/usr/bin line should be removed completely. Somebody should go through and cleanup/rationalise the installation process: check that the files installed when "python setup.py install" is run are appropriate .py package files, e.g. the EUtils installs it's own "setup.py" file in a subdirectory, which isn't very clean. Alex From mdehoon at c2b2.columbia.edu Wed Apr 11 15:44:30 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Wed, 11 Apr 2007 17:44:30 +0200 Subject: [BioPython] installing on Mandriva Linux In-Reply-To: <1176300953.3621.13.camel@localhost> References: <1176300953.3621.13.camel@localhost> Message-ID: <461D025E.9070107@c2b2.columbia.edu> tim wrote: >I get problems from this point onwards in the install, with lots of >Bio/Cluster/clustermodule errors: >... >creating build/temp.linux-i686-2.4/Bio/Cluster >gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe >-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fomit-frame-pointer -march=i586 >-mtune=pentiumpro -fasynchronous-unwind-tables -g -fPIC -IBio/Cluster >-I/usr/include/python2.4 -c Bio/Cluster/clustermodule.c -o >build/temp.linux-i686-2.4/Bio/Cluster/clustermodule.o >Bio/Cluster/clustermodule.c:2:33: error: Numeric/arrayobject.h: No such >file or directory > > This is the first error message that you get. Did you check that you have the header file arrayobject.h? And is it in the correct location? --Michiel From jhortia1 at jhu.edu Fri Apr 13 19:21:54 2007 From: jhortia1 at jhu.edu (JASON HORTIATIS) Date: Fri, 13 Apr 2007 15:21:54 -0400 Subject: [BioPython] Local Blast Output Message-ID: I'm an undergraduate using biopython to run local blast searches and I'm trying to find out how to save the entire sequence of each protein hit directly to a file. I have only managed to be able to print the portion of the sequence that matches the query using hsp.sbjct[0:]. My goal is to use the search results from one blast run as a database to search against for a subsequent run so a fasta file is needed for each hit of the first run. Thanks for the help! Jason From sbassi at gmail.com Sat Apr 14 04:14:20 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Sat, 14 Apr 2007 01:14:20 -0300 Subject: [BioPython] Local Blast Output In-Reply-To: References: Message-ID: On 4/13/07, JASON HORTIATIS wrote: > I'm an undergraduate using biopython to run local blast searches and I'm trying to find out how to save the entire sequence of each protein hit directly to a file. I have only managed to be able to print the portion of the sequence that matches the query using hsp.sbjct[0:]. My goal is to use the search results from one blast run as a database to search against for a subsequent run so a fasta file is needed for each hit of the first run. > Thanks for the help! You can only parse from the BLAST result what is inside the BLAST output. And there is no the whole sequence In such output, just the portion you've retrieved. You may need to parse the GID of the protein and then look for it in your BLAST DB (using fastacmd). Or you may use PSI-BLAST as an alternative. From elventear at gmail.com Tue Apr 17 17:52:40 2007 From: elventear at gmail.com (Pepe Barbe) Date: Tue, 17 Apr 2007 12:52:40 -0500 Subject: [BioPython] Martel Help Message-ID: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com> Hello, I am interested in using Martel for parsing some Biology formats (So far nothing new). While the module seems really good, I've been struggling to find some meaningful documentation. So far I feel I am walking in the dark. Still I've made some progress. If there is some tutorial or complete documentation out there I would appreciate if some would point me to it. My current question is the following. I have the impression that every single line that the Martel parser is going to parse must be recognized, and otherwise it will raise and Exception. Is this correct? If its true, how can I ignore anything that doesn't match a RegEx and just process what matches? Thanks, Pepe From elventear at gmail.com Wed Apr 18 16:54:30 2007 From: elventear at gmail.com (Pepe Barbe) Date: Wed, 18 Apr 2007 11:54:30 -0500 Subject: [BioPython] Martel Help In-Reply-To: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com> References: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com> Message-ID: <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com> Hello, I've been reading the meager information available for Martel and I have made good progress, I think. I am basically following the example in the Exelixis presentation. In the example, there are some things whose purpose is obvious but the implementation details (Or all the possible options) aren't. Currently I am curious on how does Martel.HeaderFooter and Std.record affect the parsing. Later in that example they use: blat.format.make_iterator("record"). Where does the "record" come from? Because of using Std.record? Any help would be deeply appreciated. Pepe From dalke at dalkescientific.com Wed Apr 18 21:45:00 2007 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 18 Apr 2007 23:45:00 +0200 Subject: [BioPython] Martel Help In-Reply-To: <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com> References: <3e73596b0704171052g7ba3abb0uc04cbce3952d2bd2@mail.gmail.com> <3e73596b0704180954k752f9be9n6a4f4f46ea2c0435@mail.gmail.com> Message-ID: On Apr 18, 2007, at 6:54 PM, Pepe Barbe wrote: > In the example, there are some things whose purpose is obvious but the > implementation details (Or all the possible options) aren't. Currently > I am curious on how does Martel.HeaderFooter and Std.record affect the > parsing. I'm having to think back several years now. A limitation with Martel is parsing large data files. It has a memory overhead of several times the data file being processed. Eg, a 1 MB file might take 7 or so MB to process. Most bioinformatics formats are composed of records. Eg, a GenBank file contains many GenBank records. The idea of the Header / Footer / HeaderFooter classes is to break the large file down into small records, and only have the overhead for parsing a record. (But it doesn't help processing large records, like the entire chromosome as a single FASTA record.) In FASTA files there is no header or footer. It can be read and split up using a RecordReader. Specifically with a StartsWith record reader told to look for a ">" which marks the start of a new record. Compare to SwissProt where the record ends with a "//" line. Some formats are more complicated. GenBank is one. Real genbank files start with a header, something like GBGSS1.SEQ Genetic Sequence Data Bank February 15 2003 NCBI-GenBank Flat File Release 134.0 GSS Sequences (Part 1) 88066 loci, 66600405 bases, from 88066 reported sequences There needs to be a way to process a single, unique header, followed by 0-or-more repeats of a record, followed by an optional footer. Use the HeaderFooter expression for this case. In general, this is a clumsy solution. Ignore the Std.record. My thought was that the different terms in the expression could be standardized. For example, that all sequences are tagged with "bio:seq". I hoped this would minimize the work needed to add a new format because most of the handlers would look for expected tags, and not depend so much on the actual structure of the XML. It proved too complicated to explain and use. > Later in that example they use: blat.format.make_iterator("record"). > Where does the "record" come from? Because of using Std.record? The "record" comes from a group name used in the expression. It describes the point where the repetition will be done. Andrew dalke at dalkescientific.com From skhadar at gmail.com Fri Apr 20 12:47:07 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Fri, 20 Apr 2007 18:17:07 +0530 Subject: [BioPython] Protparam using BioPythn Message-ID: Dear All, I am looking for a script to run Protparam for a 1000 sequence. It will be great if anyone can point me to a program / web page to get it done. Many thanks in advance, Shameer Khadar From biopython at maubp.freeserve.co.uk Fri Apr 20 13:51:54 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Apr 2007 14:51:54 +0100 Subject: [BioPython] Protparam using BioPython In-Reply-To: References: Message-ID: <4628C57A.7010803@maubp.freeserve.co.uk> Shameer Khadar wrote: > Dear All, > > I am looking for a script to run Protparam for a 1000 sequence. It will be > great if anyone can point me to a program / web page to get it done. Do you mean the Biopython module Bio.SeqUtils.ProtParam which does protein analysis (e.g. isoelectric point). Did you mean the Expasy ProtParam tool available online? If you only have a few sequences doing them online by hand would be easy: http://www.expasy.org/tools/protparam.html Or did you mean something else? Peter P.S. did you mean 1000 different sequences, or a single 1000 amino acid sequence? From skhadar at gmail.com Fri Apr 20 15:19:01 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Fri, 20 Apr 2007 20:49:01 +0530 Subject: [BioPython] Protparam using BioPython In-Reply-To: <4628C57A.7010803@maubp.freeserve.co.uk> References: <4628C57A.7010803@maubp.freeserve.co.uk> Message-ID: Dear Peter, Thanks for your reply. I was looking for a script based on Bio.SeqUtils. I got the following script from a website, its working perfect for me. But the problem is i have around 1000 sequence (in raw format without headers) and i thought to process it using a foreach equivalent in python(I am a python newbie). But its only a couple of minutes back i came to know that there is no foreach in python, but some better alternative is available !!!. It will be great if you can help to process my file using this program. program : from Bio.SeqUtils import ProtParam, ProtParamData def PrintDictionary(MyDict): for i in MyDict.keys(): print "%s\t%.2f" %(i, MyDict[i]) print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL" X = ProtParam.ProteinAnalysis("") print "Instability index of test protein: %.2f" % X.instability_index() first few lines of my file : AEGEFAHLYGTFRED AEGEFAHLZGTFRED AEGEFGATYGVYTSD AEGEFGATZGVYTSD AEGEFGATYGVZTSD AEGEFGATZGVZTSD AEGEFLYGEIQGTQD Thank you once again, Shameer On 4/20/07, Peter wrote: > > Shameer Khadar wrote: > > Dear All, > > > > I am looking for a script to run Protparam for a 1000 sequence. It will > be > > great if anyone can point me to a program / web page to get it done. > > Do you mean the Biopython module Bio.SeqUtils.ProtParam which does > protein analysis (e.g. isoelectric point). > > Did you mean the Expasy ProtParam tool available online? If you only > have a few sequences doing them online by hand would be easy: > http://www.expasy.org/tools/protparam.html > > Or did you mean something else? > > Peter > > P.S. did you mean 1000 different sequences, or a single 1000 amino acid > sequence? > > From alexl at users.sourceforge.net Wed Apr 25 08:22:44 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Wed, 25 Apr 2007 01:22:44 -0700 Subject: [BioPython] Bioperl packages now available for Fedora Message-ID: <3kzm4w50dn.fsf@delpy.biol.berkeley.edu> Hi all, Fedora packages for Biopython are now available in the official Fedora repositories. Packages for Fedora Core 6 (FC-6) and Rawhide (the soon-to-be Fedora 7) are available immediately and are installable via the simple yum command: # sudo yum install python-biopython and through any other GUI based installers available for Fedora, such as piruit, smart or yumex. The name of the package is python-biopython. (A package for Fedora Core 5 has been built and should be in the FC-5 repository within the next 24 hours or so). These packages have all optional packages enabled by default: MySQL-python, python-reportlab and Wise2. Please file bugs on these packages in Red Hat/Fedora bugzilla under "Fedora Extras": https://bugzilla.redhat.com/bugzilla/ please choose your release and select the "python-biopython" component. If somebody could update the wiki page with this information, that would be great: http://biopython.org/wiki/Download Alex From biopython at maubp.freeserve.co.uk Fri Apr 27 09:55:42 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Apr 2007 10:55:42 +0100 Subject: [BioPython] Protparam using BioPython In-Reply-To: <4628C57A.7010803@maubp.freeserve.co.uk> References: <4628C57A.7010803@maubp.freeserve.co.uk> Message-ID: <4631C89E.3090208@maubp.freeserve.co.uk> Shameer Khadar wrote: > Dear Peter, > > Thanks for your reply. Sorry for the delay - I was away on a course this week. > I was looking for a script based on Bio.SeqUtils. > I got the following script from a website, its working perfect for me. But > the problem is i have around 1000 sequence (in raw format without headers) > and i thought to process it using a foreach equivalent in python(I am a > python newbie). But its only a couple of minutes back i came to know that > there is no foreach in python, but some better alternative is available > !!!. There is a "for each" equivalent in python! http://docs.python.org/tut/node6.html If you don't have a good introductory python book, that online tutorial is an excellent starting point. > It will be great if you can help to process my file using this > program. > > program : > from Bio.SeqUtils import ProtParam, ProtParamData > def PrintDictionary(MyDict): > for i in MyDict.keys(): > print "%s\t%.2f" %(i, MyDict[i]) > print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL" > X = ProtParam.ProteinAnalysis("") > print "Instability index of test protein: %.2f" % X.instability_index() It seems like you have only given bits of a program, so I have tried to guess what you meant. > first few lines of my file : > AEGEFAHLYGTFRED > AEGEFAHLZGTFRED > AEGEFGATYGVYTSD > AEGEFGATZGVYTSD > AEGEFGATYGVZTSD > AEGEFGATZGVZTSD > AEGEFLYGEIQGTQD In the following example, I am assuming your sequences are in a plain text file, called protparam.txt, which contains each sequence on a single line. Try something like this first of all, and make sure that it prints out your sequences correctly: for line in open("protparam.txt") : #Remove any trailing new lines or white space seq_string = line.rstrip() print "Sequence <%s>" % seq_string Then try doing the ProtParam.ProteinAnalysis of each sequence string: from Bio.SeqUtils import ProtParam, ProtParamData for line in open("protparam.txt") : #Remove any trailing new lines or white space seq_string = line.rstrip() print "Sequence <%s>" % seq_string X = ProtParam.ProteinAnalysis(seq_string) print "Instability index: %.2f" % X.instability_index() You'll find it doesn't like the "Z" (presumably this is Glx - glutamic acid or glutamine? i.e. E or Q) present in many of your sequences, so this next version uses error handling to note this and then carry on to the next sequence: from Bio.SeqUtils import ProtParam, ProtParamData for line in open("protparam.txt") : #Remove any trailing new lines or white space seq_string = line.rstrip() print #blank line print "Sequence <%s>" % seq_string X = ProtParam.ProteinAnalysis(seq_string) try : print "Instability index: %.2f" % X.instability_index() except KeyError, e : print "Problem with the letter %s in the sequence?" % str(e) The output is: Sequence Instability index: 8.39 Sequence Problem with the letter 'Z' in the sequence? Sequence Instability index: -17.70 Sequence Problem with the letter 'Z' in the sequence? Sequence Problem with the letter 'Z' in the sequence? Sequence Problem with the letter 'Z' in the sequence? Sequence Instability index: 8.61 You'll have to check yourself to see if these numbers are sensible. I don't know what to suggest for your "Z" entries - the stability will be different if you try using E or Q instead. Peter From biopython at maubp.freeserve.co.uk Sat Apr 28 08:58:40 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Apr 2007 09:58:40 +0100 Subject: [BioPython] EMBL parsing in Biopython 1.43 Message-ID: <46330CC0.9060708@maubp.freeserve.co.uk> As part of the new SeqIO system introduced in Biopython 1.43, I added the ability to read in EMBL format sequences. http://biopython.org/wiki/SeqIO I would be interested to hear feedback (positive or negative) from anyone who has tried to use this. Peter From alexl at users.sourceforge.net Sat Apr 28 10:21:40 2007 From: alexl at users.sourceforge.net (Alex Lancaster) Date: Sat, 28 Apr 2007 03:21:40 -0700 Subject: [BioPython] Somebody vandalised the wiki download page Message-ID: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu> I just created an account and fixed it with this edit: http://biopython.org/w/index.php?title=Download&diff=1868&oldid=1867 Can somebody with sufficient admin privileges block user "Uzman"? Thanks, Alex From cjfields at uiuc.edu Sat Apr 28 13:53:37 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 28 Apr 2007 08:53:37 -0500 Subject: [BioPython] Somebody vandalised the wiki download page In-Reply-To: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu> References: <7x1wi4re8b.fsf@delpy.biol.berkeley.edu> Message-ID: Done. chris On Apr 28, 2007, at 5:21 AM, Alex Lancaster wrote: > I just created an account and fixed it with this edit: > > http://biopython.org/w/index.php?title=Download&diff=1868&oldid=1867 > > Can somebody with sufficient admin privileges block user "Uzman"? > > Thanks, > Alex > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mdehoon at c2b2.columbia.edu Sun Apr 29 10:16:31 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Apr 2007 19:16:31 +0900 Subject: [BioPython] EMBL parsing in Biopython 1.43 In-Reply-To: <46330CC0.9060708@maubp.freeserve.co.uk> References: <46330CC0.9060708@maubp.freeserve.co.uk> Message-ID: <4634707F.5060607@c2b2.columbia.edu> Thanks Peter! I tried this EMBL-formatted file (using the latest version of Biopython in CVS): ftp://ftp.pasteur.fr/pub/GenomeDB/SubtiList/FlatFiles/SLR16.1_embl.txt but I got this error message: >>> from Bio import SeqIO >>> input = open("SLR16.1_embl.txt") >>> records = SeqIO.parse(input, format="embl") >>> records.next() Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 410, in parse_records record = self.parse(handle) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 393, in parse if self.feed(handle, consumer) : File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 360, in feed self._feed_first_line(consumer, self.line) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 540, in _feed_first_line assert len(fields) == 7 AssertionError >>> Do you have an idea as to what may be going wrong here? --Michiel. Peter wrote: > As part of the new SeqIO system introduced in Biopython 1.43, I added > the ability to read in EMBL format sequences. > > http://biopython.org/wiki/SeqIO > > I would be interested to hear feedback (positive or negative) from > anyone who has tried to use this. > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Sun Apr 29 20:02:05 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 29 Apr 2007 21:02:05 +0100 Subject: [BioPython] EMBL parsing in Biopython 1.43 In-Reply-To: <4634707F.5060607@c2b2.columbia.edu> References: <46330CC0.9060708@maubp.freeserve.co.uk> <4634707F.5060607@c2b2.columbia.edu> Message-ID: <4634F9BD.8070909@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Thanks Peter! > > I tried this EMBL-formatted file (using the latest version of Biopython > in CVS): > > ftp://ftp.pasteur.fr/pub/GenomeDB/SubtiList/FlatFiles/SLR16.1_embl.txt > > but I got this error message: > > >>> from Bio import SeqIO > >>> input = open("SLR16.1_embl.txt") > >>> records = SeqIO.parse(input, format="embl") > >>> records.next() > Traceback (most recent call last): ... > "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", > line 540, in _feed_first_line > assert len(fields) == 7 > AssertionError > >>> Does the same here on with CVS Biopython on Linux with python 2.4 > Do you have an idea as to what may be going wrong here? Yes - I wrote and EMBL parser using the latest file format, while I suspect your file from the Pasteur Institute uses an older format - specifically one where the first list (the ID line) has a different number of fields. This is reminiscent of the various revisions to the GenBank LOCUS line which we also have to cope with. I hope to have a fix in CVS today/tomorrow. Peter From biopython at maubp.freeserve.co.uk Sun Apr 29 22:11:07 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 29 Apr 2007 23:11:07 +0100 Subject: [BioPython] EMBL parsing in Biopython 1.43 In-Reply-To: <4634F9BD.8070909@maubp.freeserve.co.uk> References: <46330CC0.9060708@maubp.freeserve.co.uk> <4634707F.5060607@c2b2.columbia.edu> <4634F9BD.8070909@maubp.freeserve.co.uk> Message-ID: <463517FB.9090706@maubp.freeserve.co.uk> Peter wrote: > Michiel de Hoon wrote: >> Do you have an idea as to what may be going wrong here? > > Yes - I wrote and EMBL parser using the latest file format, while I > suspect your file from the Pasteur Institute uses an older format - > specifically one where the first list (the ID line) has a different > number of fields. The file you tried seems to use the pre 2006 style ID line. I found another example like this on the BioPerl webpage. See also: http://www.ebi.ac.uk/embl/Documentation/archivedchanges.html > I hope to have a fix in CVS today/tomorrow. I have updated Bio/GenBank/Scanner.py to cope with these old EMBL ID lines and added another EMBL test case to test_SeqIO.py Your example now parses fine, giving a single SeqRecord as expected. I have not checked the annotation or features... Peter From skhadar at gmail.com Mon Apr 30 13:01:56 2007 From: skhadar at gmail.com (Shameer Khadar) Date: Mon, 30 Apr 2007 18:31:56 +0530 Subject: [BioPython] Protparam using BioPython In-Reply-To: <4631C89E.3090208@maubp.freeserve.co.uk> References: <4628C57A.7010803@maubp.freeserve.co.uk> <4631C89E.3090208@maubp.freeserve.co.uk> Message-ID: Dear Peter, Thanks a lot for you detailed reply and splendid help !!! It worked !! Cheers, Shameer On 4/27/07, Peter wrote: > > Shameer Khadar wrote: > > Dear Peter, > > > > Thanks for your reply. > > Sorry for the delay - I was away on a course this week. > > > I was looking for a script based on Bio.SeqUtils. > > I got the following script from a website, its working perfect for me. > But > > the problem is i have around 1000 sequence (in raw format without > headers) > > and i thought to process it using a foreach equivalent in python(I am a > > python newbie). But its only a couple of minutes back i came to know > that > > there is no foreach in python, but some better alternative is available > > !!!. > > There is a "for each" equivalent in python! > http://docs.python.org/tut/node6.html > > If you don't have a good introductory python book, that online tutorial > is an excellent starting point. > > > It will be great if you can help to process my file using this > > program. > > > > program : > > from Bio.SeqUtils import ProtParam, ProtParamData > > def PrintDictionary(MyDict): > > for i in MyDict.keys(): > > print "%s\t%.2f" %(i, MyDict[i]) > > print "MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFL" > > X = ProtParam.ProteinAnalysis("") > > print "Instability index of test protein: %.2f" % X.instability_index() > > It seems like you have only given bits of a program, so I have tried to > guess what you meant. > > > first few lines of my file : > > AEGEFAHLYGTFRED > > AEGEFAHLZGTFRED > > AEGEFGATYGVYTSD > > AEGEFGATZGVYTSD > > AEGEFGATYGVZTSD > > AEGEFGATZGVZTSD > > AEGEFLYGEIQGTQD > > In the following example, I am assuming your sequences are in a plain > text file, called protparam.txt, which contains each sequence on a > single line. > > Try something like this first of all, and make sure that it prints out > your sequences correctly: > > for line in open("protparam.txt") : > #Remove any trailing new lines or white space > seq_string = line.rstrip() > print "Sequence <%s>" % seq_string > > Then try doing the ProtParam.ProteinAnalysis of each sequence string: > > from Bio.SeqUtils import ProtParam, ProtParamData > for line in open("protparam.txt") : > #Remove any trailing new lines or white space > seq_string = line.rstrip() > print "Sequence <%s>" % seq_string > X = ProtParam.ProteinAnalysis(seq_string) > print "Instability index: %.2f" % X.instability_index() > > You'll find it doesn't like the "Z" (presumably this is Glx - glutamic > acid or glutamine? i.e. E or Q) present in many of your sequences, so > this next version uses error handling to note this and then carry on to > the next sequence: > > from Bio.SeqUtils import ProtParam, ProtParamData > for line in open("protparam.txt") : > #Remove any trailing new lines or white space > seq_string = line.rstrip() > > print #blank line > print "Sequence <%s>" % seq_string > X = ProtParam.ProteinAnalysis(seq_string) > try : > print "Instability index: %.2f" % X.instability_index() > except KeyError, e : > print "Problem with the letter %s in the sequence?" % str(e) > > The output is: > > Sequence > Instability index: 8.39 > > Sequence > Problem with the letter 'Z' in the sequence? > > Sequence > Instability index: -17.70 > > Sequence > Problem with the letter 'Z' in the sequence? > > Sequence > Problem with the letter 'Z' in the sequence? > > Sequence > Problem with the letter 'Z' in the sequence? > > Sequence > Instability index: 8.61 > > You'll have to check yourself to see if these numbers are sensible. I > don't know what to suggest for your "Z" entries - the stability will be > different if you try using E or Q instead. > > Peter > > From jhortia1 at jhu.edu Mon Apr 30 20:16:42 2007 From: jhortia1 at jhu.edu (JASON HORTIATIS) Date: Mon, 30 Apr 2007 16:16:42 -0400 Subject: [BioPython] local blast output Message-ID: Dear all, I'm a novice using biopython to run local blast searches and save the output to a file, but i've run into a problem becuase it seems as though the b_parser has a limit of 250 sequences, however my searches are returning far more than 250 sequences. Does anyone know if the parser really is limited, and if so if it is possible to work around this? Thanks for the help, Jason From sbassi at gmail.com Mon Apr 30 21:26:50 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Mon, 30 Apr 2007 18:26:50 -0300 Subject: [BioPython] local blast output In-Reply-To: References: Message-ID: On 4/30/07, JASON HORTIATIS wrote: > Dear all, > I'm a novice using biopython to run local blast searches and save the output to a file, but i've run into a problem becuase it seems as though the b_parser has a limit of 250 sequences, however my searches are returning far more than 250 sequences. Does anyone know if the parser really is limited, and if so if it is possible to work around this? > Thanks for the help, There is no 250 limit in the parser. Please show us to code to help you. Also tell us blast and biopython version. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318