From timmcilveen at talktalk.net Tue Apr 8 12:36:37 2008 From: timmcilveen at talktalk.net (Tim) Date: Tue, 08 Apr 2008 17:36:37 +0100 Subject: [BioPython] Python 3 Message-ID: <47FB9F15.2050304@talktalk.net> Hi, I have been learning python and biopython side by side for a while now but notice that the release of python 3 is now approaching with many new features. Will biopython in its current form be easy to port to python 3, or is this going to be a major project ? Thanks, Tim From sbassi at gmail.com Tue Apr 8 12:54:50 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Apr 2008 13:54:50 -0300 Subject: [BioPython] Python 3 In-Reply-To: <47FB9F15.2050304@talktalk.net> References: <47FB9F15.2050304@talktalk.net> Message-ID: On Tue, Apr 8, 2008 at 1:36 PM, Tim wrote: > I have been learning python and biopython side by side for a while now but > notice that the release of python 3 is now approaching with many new > features. Will biopython in its current form be easy to port to python 3, or > is this going to be a major project ? Biopython depends on some other packages. After these packages are migrated, there will be no Biopython for Python 3. Most external packages won't be ready by the release of Python 3. Just after the release of Python 3.0, most external modules will start migrating. Since this is a future event take what I say as a speculation :) -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From biopython at maubp.freeserve.co.uk Tue Apr 8 13:14:36 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Apr 2008 18:14:36 +0100 Subject: [BioPython] Python 3 In-Reply-To: <47FB9F15.2050304@talktalk.net> References: <47FB9F15.2050304@talktalk.net> Message-ID: <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> On Tue, Apr 8, 2008 at 5:36 PM, Tim wrote: > Hi, > I have been learning python and biopython side by side for a while now but > notice that the release of python 3 is now approaching with many new > features. Will biopython in its current form be easy to port to python 3, or > is this going to be a major project ? The short answer is I don't know, and as far as I know, no one has tried yet. As Sebastian points out, Biopython does depend on several third party modules for some of its functionality - and they too would have to be ported to Python 3. Peter From idoerg at gmail.com Tue Apr 8 13:29:53 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Tue, 8 Apr 2008 10:29:53 -0700 Subject: [BioPython] Python 3 In-Reply-To: <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> References: <47FB9F15.2050304@talktalk.net> <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> Message-ID: Python 3.x is almost a new language. Thus I expect that if there will be a Biopython 3.x, it will be developed from the ground up, by enterprising souls woho would like to see that happen. But Python 2.x will still be around for a while. The powers-that-be in Python are trying to make the whole process of 3.0 porting as gentle as possible. Thus Python 2.x is not going away anytime soon and 2.x versions will be around for a while. Also, 2.6 will be in a sense forward compatible. >From the horse's mouth: " Python 2.6 will support forward compatibility in the following two ways: - It will support a "Py3k warnings mode" which will warn dynamically (i.e. at runtime) about features that will stop working in Python 3.0, e.g. assuming that range() returns a list. - It will contain backported versions of many Py3k features, either enabled through __future__ statements or simply by allowing old and new syntax to be used side-by-side (if the new syntax would be a syntax error in 2.x). " On Tue, Apr 8, 2008 at 10:14 AM, Peter wrote: > On Tue, Apr 8, 2008 at 5:36 PM, Tim wrote: > > Hi, > > I have been learning python and biopython side by side for a while now > but > > notice that the release of python 3 is now approaching with many new > > features. Will biopython in its current form be easy to port to python > 3, or > > is this going to be a major project ? > > The short answer is I don't know, and as far as I know, no one has tried > yet. > > As Sebastian points out, Biopython does depend on several third party > modules for some of its functionality - and they too would have to be > ported to Python 3. > > Peter > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg, Ph.D. CALIT2, mail code 0440 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0440, USA T: +1 (858) 534-0570 T: +1 (858) 646-3100 x3516 http://iddo-friedberg.org From cjfields at uiuc.edu Tue Apr 8 13:52:16 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 Apr 2008 12:52:16 -0500 Subject: [BioPython] Python 3 In-Reply-To: References: <47FB9F15.2050304@talktalk.net> <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> Message-ID: <2D7C07AD-DEB9-4866-BD5B-904D5510F8B7@uiuc.edu> This is similar to what the bioperl devs think will eventually happen with bioperl and perl 6 (which, like python 3.0, is pretty much a new language). Not much one can do about it, but it does give u the opportunity to rethink implementations. In the meantime, bioperl/ perl5 will be around for a while. chris On Apr 8, 2008, at 12:29 PM, Iddo Friedberg wrote: > Python 3.x is almost a new language. Thus I expect that if there > will be a > Biopython 3.x, it will be developed from the ground up, by > enterprising > souls woho would like to see that happen. But Python 2.x will still be > around for a while. > > The powers-that-be in Python are trying to make the whole process of > 3.0 > porting as gentle as possible. Thus Python 2.x is not going away > anytime > soon and 2.x versions will be around for a while. Also, 2.6 will be > in a > sense forward compatible. > >> From the horse's mouth: > " > > Python 2.6 will support forward compatibility in the following two > ways: > > - It will support a "Py3k warnings mode" which will warn dynamically > (i.e. at runtime) about features that will stop working in Python > 3.0, e.g. > assuming that range() returns a list. > - It will contain backported versions of many Py3k features, either > enabled through __future__ statements or simply by allowing old > and new > syntax to be used side-by-side (if the new syntax would be a > syntax error in > 2.x). > > " > > > On Tue, Apr 8, 2008 at 10:14 AM, Peter > > wrote: > >> On Tue, Apr 8, 2008 at 5:36 PM, Tim wrote: >>> Hi, >>> I have been learning python and biopython side by side for a while >>> now >> but >>> notice that the release of python 3 is now approaching with many new >>> features. Will biopython in its current form be easy to port to >>> python >> 3, or >>> is this going to be a major project ? >> >> The short answer is I don't know, and as far as I know, no one has >> tried >> yet. >> >> As Sebastian points out, Biopython does depend on several third party >> modules for some of its functionality - and they too would have to be >> ported to Python 3. >> >> Peter >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > > -- > > Iddo Friedberg, Ph.D. > CALIT2, mail code 0440 > University of California, San Diego > 9500 Gilman Drive > La Jolla, CA 92093-0440, USA > T: +1 (858) 534-0570 > T: +1 (858) 646-3100 x3516 > http://iddo-friedberg.org > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From timmcilveen at talktalk.net Tue Apr 8 14:43:33 2008 From: timmcilveen at talktalk.net (Tim) Date: Tue, 08 Apr 2008 19:43:33 +0100 Subject: [BioPython] Python 3 In-Reply-To: <47FB9F15.2050304@talktalk.net> References: <47FB9F15.2050304@talktalk.net> Message-ID: <47FBBCD5.3060904@talktalk.net> Thanks for the replies everyone. I have been programming perl for quite a while, but have been in the process of learning python, for all of the nice OO features. I knew that python 3 was on the horizon, but I thought that this would be in a Perl 6 kind of way - i.e. sometime in the very distant future :-) I didn't realise that it would be this summer. Thanks again, Tim > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From cjfields at uiuc.edu Tue Apr 8 15:04:17 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 Apr 2008 14:04:17 -0500 Subject: [BioPython] Python 3 In-Reply-To: <47FBBCD5.3060904@talktalk.net> References: <47FB9F15.2050304@talktalk.net> <47FBBCD5.3060904@talktalk.net> Message-ID: <49D0CCCE-04E6-444A-B5C6-FB62A2EDDF88@uiuc.edu> Perl 6 is supposed to arrive at Christmas. Now which Christmas is anyone's guess. One can play around with two pre-alpha Perl 6 implementations, Parrot and Pugs, in the meantime (something I have been doing a bit lately...) chris On Apr 8, 2008, at 1:43 PM, Tim wrote: > Thanks for the replies everyone. I have been programming perl for > quite a while, but have been in the process of learning python, for > all of the nice OO features. I knew that python 3 was on the > horizon, but I thought that this would be in a Perl 6 kind of way - > i.e. sometime in the very distant future :-) I didn't realise that > it would be this summer. > > Thanks again, > Tim > >> >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From alaguraj.v at gmail.com Tue Apr 8 15:29:52 2008 From: alaguraj.v at gmail.com (Alaguraj Veluchamy) Date: Tue, 8 Apr 2008 12:29:52 -0700 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN Message-ID: Dear people,I am stuck in parsing Blast XML output. I am trying to extract the co-ordinate and hence the sequence of the hits for my query in standalone blast. XML output is coming for query proteins against a genbank genome file. (using TBLASTN) The problem for me is- I am unable to get the co-ordinate(hit-from --> hit-to). I used hsp.sbjct_start and hsp.sbct_end, but it gives two numbers which is different from the original. when i tried extracting those regions from the genbank file and do an alignment, it nowhere matches. so what is this hsp.sbjct_start - integers?? how to get the hit region (or alignment) cordinates? I read record.py--the library file, but they given it is 3x times larger (but still it is incorrect) Regards -Alaguraj.V From sbassi at gmail.com Tue Apr 8 15:36:59 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Apr 2008 16:36:59 -0300 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN In-Reply-To: References: Message-ID: On Tue, Apr 8, 2008 at 4:29 PM, Alaguraj Veluchamy wrote: > Dear people,I am stuck in parsing Blast XML output. Give me your XML output file and your code to see the problem. Maybe you can't align because the is a difference in polarity between the BLAST hit and the original sequence (look at the 1 and -1 in Hsp_query-frame and Hsp_hit-frame tags, does your problem happends when you have 1 and -1 or even when you have both 1?). You should give us a more detailed report, including xml file and what are you expecting and what are you getting. If the XML file is too big, upload it to another place and tell us the URL to download it. Best, SB. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From biopython at maubp.freeserve.co.uk Tue Apr 8 15:45:27 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Apr 2008 20:45:27 +0100 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN In-Reply-To: References: Message-ID: <320fb6e00804081245s550c600cu5a7d420933b9777e@mail.gmail.com> Hi Alaguraj, Welcome to the Biopython mailing list. On Tue, Apr 8, 2008 at 8:36 PM, Sebastian Bassi wrote: > On Tue, Apr 8, 2008 at 4:29 PM, Alaguraj Veluchamy wrote: > > Dear people,I am stuck in parsing Blast XML output. > > Give me your XML output file and your code to see the problem. > Maybe you can't align because the is a difference in polarity between > the BLAST hit and the original sequence (look at the 1 and -1 in > Hsp_query-frame and Hsp_hit-frame tags, does your problem happends > when you have 1 and -1 or even when you have both 1?). Maybe - but I suspect the problem is you are looking at the subject coordinates (hsp.sbjct_start and hsp.sbjct_end - these refer to the matched sequence in the database), and not the search query coordinates (hsp.query_start and hsp.query_end - these refer to your input query sequence). > You should give us a more detailed report, including xml file and what > are you expecting and what are you getting. > If the XML file is too big, upload it to another place and tell us the > URL to download it. Or better yet, if you are sure this is a bug, file it on Bugzilla. And then attach the XML file to the bug. http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython Peter From nuin at genedrift.org Tue Apr 8 15:17:39 2008 From: nuin at genedrift.org (Paulo Nuin) Date: Tue, 08 Apr 2008 15:17:39 -0400 Subject: [BioPython] Python 3 In-Reply-To: <49D0CCCE-04E6-444A-B5C6-FB62A2EDDF88@uiuc.edu> References: <47FB9F15.2050304@talktalk.net> <47FBBCD5.3060904@talktalk.net> <49D0CCCE-04E6-444A-B5C6-FB62A2EDDF88@uiuc.edu> Message-ID: <47FBC4D3.4090201@genedrift.org> Just after Duke Nukem Forever ... Chris Fields wrote: > Perl 6 is supposed to arrive at Christmas. Now which Christmas is > anyone's guess. > > One can play around with two pre-alpha Perl 6 implementations, Parrot > and Pugs, in the meantime (something I have been doing a bit lately...) > > chris > > On Apr 8, 2008, at 1:43 PM, Tim wrote: > >> Thanks for the replies everyone. I have been programming perl for >> quite a while, but have been in the process of learning python, for >> all of the nice OO features. I knew that python 3 was on the horizon, >> but I thought that this would be in a Perl 6 kind of way - i.e. >> sometime in the very distant future :-) I didn't realise that it >> would be this summer. >> >> Thanks again, >> Tim >> >>> >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From kteague at bcgsc.ca Tue Apr 8 15:35:36 2008 From: kteague at bcgsc.ca (Kevin Teague) Date: Tue, 8 Apr 2008 12:35:36 -0700 Subject: [BioPython] Python 3 In-Reply-To: <2D7C07AD-DEB9-4866-BD5B-904D5510F8B7@uiuc.edu> References: <47FB9F15.2050304@talktalk.net> <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> <2D7C07AD-DEB9-4866-BD5B-904D5510F8B7@uiuc.edu> Message-ID: > > This is similar to what the bioperl devs think will eventually > happen with bioperl and perl 6 (which, like python 3.0, is pretty > much a new language). Not much one can do about it, but it does > give u the opportunity to rethink implementations. In the > meantime, bioperl/perl5 will be around for a while. Python 3 and Perl 6 are similar in that they break backwards compatability, but beyond that Perl 6 is a complete rethinking and reimplementation of the entire language. Python 3 is much smaller in scope, it's primary goal is to clean-up early design warts in the langauge. The only syntactically significant change is the replacement of the print statement with a print() function. Most of the other changes are fairly under-the-hood. Only one string type (unicode) instead of two (unicode and ascii), only one integer type (int) instead of two (ing and longint), etc.: http://docs.python.org/dev/3.0/whatsnew/3.0.html > > On Apr 8, 2008, at 12:29 PM, Iddo Friedberg wrote: > >> Python 3.x is almost a new language. Thus I expect that if there >> will be a >> Biopython 3.x, it will be developed from the ground up, by >> enterprising >> souls woho would like to see that happen. But Python 2.x will >> still be >> around for a while. >> Note that Guido van Rossum and others have been encouraging projects to port to Python 3 *without* changing the APIs of the project. The goal is to make it possible to support a Python 3 port from a Python 2 source tree using the 2to3 tool. API changes are encourged to be made in Python 2, before porting (http://www.artima.com/weblogs/ viewpost.jsp?thread=227041). >> The powers-that-be in Python are trying to make the whole process >> of 3.0 >> porting as gentle as possible. Thus Python 2.x is not going away >> anytime >> soon and 2.x versions will be around for a while. Also, 2.6 will >> be in a >> sense forward compatible. >> Yes, Python 2 will not be going away anytime soon. The changes between Python 2 and 3 aren't *that* major either, so for people trying to just get work done, there is really no hurry to get to Python 3. Python 3.0 will also be a bit of an early adopters release, it's expected that a 3.1 and perhaps 3.2 release will follow in fairly short order after Python 3, which is probably when larger projects with library dependencies such as BioPython will more realistically consider a port. I would guess it would be another 3 or 4 years from now before there are a significant amount of useful libraries that are only being maintained for Python 3 that one will have a pragmatic motiviation for porting. From 2huggie at gmail.com Wed Apr 9 09:56:57 2008 From: 2huggie at gmail.com (Timothy Wu) Date: Wed, 9 Apr 2008 21:56:57 +0800 Subject: [BioPython] Ensembl Message-ID: Hi there, I am tryign to access data from Ensembl. I noticed that Ensembl has a Perl API. Any support for Ensembl here at BioPython? Timothy From alaguraj.v at gmail.com Sat Apr 12 17:23:26 2008 From: alaguraj.v at gmail.com (Alaguraj Veluchamy) Date: Sat, 12 Apr 2008 14:23:26 -0700 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN Message-ID: Dear people, I thought this will give a solution to the problem, i faced. If you do "formatdb" of blast with ".gbk" files of NCBI database, then the sequence co-ordinates are processed and gives numbers which doesnt match with the original sequence. Record.Py in the lib module doesnt say much about this. The solution is- do format the blast database with ".fna". This gives out a result with correct co-ordinates. This co-ordinate can be used to retrieve the sequence or can be used to parse the ".gbk" files. To make it precise, Format with "fna" files and parse with "gbk" files. This is true for all versions of BLAST. As two members have asked for the code.. here it is: ########################## import os import sys import dircache import re from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML path="/Users/alaguraj/soca/genomes/Bacteria" lstDb=os.listdir(path) blastdbpath="/Users/alaguraj/blastSmVsall/blastDB" myArray=[] for db in lstDb: eachdirpath=path+"/"+db listfiles=dircache.listdir(eachdirpath) print listfiles for listfile in listfiles: command='/Users/alaguraj/progs/blast-2.2.17/bin/blastall -i '+inputfile+' -p tblastn -e 0.001 -d '+blastdbpath+'/'+db+'/'+listfile+' -m 7 \n' w, read, e = os.popen3(command) b_parser = NCBIXML.BlastParser() blast_records = b_parser.parse(read) for b_record in blast_records: myArray.append(b_record) ########I am appending here because i need the data to further process for my application######### for eachmyArray in myArray: for alignment in eachmyArray.alignments: for hsp in alignment.hsps: print hsp.sbjct_start print hsp.sbjct_end ############################################################ The next code is For parsing the genbank file parser=GenBank.RecordParser() record = parser.parse(open(genomefile)) for feature in record.features: if feature.key=="CDS": for qualifier in feature.qualifiers: if qualifier.key=="/protein_id=": protid=qualifier.value[1:-1] print protid if qualifier.key=="/product=": func=qualifier.value[1:-1] print protid ############################################################# Sorry for the long mail. Hope it helps. Regards, Alaguraj.V Dear people,I am stuck in parsing Blast XML output. I am trying to extract the co-ordinate and hence the sequence of the hits for my query in standalone blast. XML output is coming for query proteins against a genbank genome file. (using TBLASTN) The problem for me is- I am unable to get the co-ordinate(hit-from --> hit-to). I used hsp.sbjct_start and hsp.sbct_end, but it gives two numbers which is different from the original. when i tried extracting those regions from the genbank file and do an alignment, it nowhere matches. so what is this hsp.sbjct_start - integers?? how to get the hit region (or alignment) cordinates? I read record.py--the library file, but they given it is 3x times larger (but still it is incorrect) Regards -Alaguraj.V From sbassi at gmail.com Sun Apr 13 01:49:44 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Sun, 13 Apr 2008 02:49:44 -0300 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN In-Reply-To: References: Message-ID: On Sat, Apr 12, 2008 at 6:23 PM, Alaguraj Veluchamy wrote: > If you do "formatdb" of blast with ".gbk" files of NCBI database, then > the sequence co-ordinates are processed and gives numbers which doesnt > match with the original sequence. > Record.Py in the lib module doesnt say much about this. Formatdb works with fasta or ans1 as input. I think this issue is not a problem of the parser. The parser reads whatever is in the BLAST output file. You should take a look on how did you generate it. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From vsla at wanadoo.es Sun Apr 13 16:43:44 2008 From: vsla at wanadoo.es (Virginia S. Lopez) Date: Sun, 13 Apr 2008 22:43:44 +0200 Subject: [BioPython] Trie API for serializing a trie Message-ID: <200804132128.m3DLSD5M021030@portal.open-bio.org> Hi all, I'd like to use the trie in Bio/trie.h from other C program (which I access using python). I need to implement a save() and load() methods, that will call serialize() and deserialize() in Bio/trie.c. I'm having problems to get this working, it compiles fine, and I can serialize data to disk, but I can't deserialize data to create a trie from the file. I don't know wheter the problem is in how I write trie's data to disk, or when I load it from disk, or both. I'm new to C and I possibly do a silly mistake. I'm copying some of my code below, and attach the full code. It takes one argument that is a name of a txt file containing the keys (one per line), and builds the trie, and saves it to file "savefile.txt" in binary. (To test to load the trie from the saved file, I simply comment a code block in main().) I'd massively appreciate any suggestions about what I'm doing wrong. Many thanks. Virginia. --------------------- /* Methods called when serializing trie data */ static int _write_to_handle(const void *towrite, const int length, void *handle) { ??? fwrite(&towrite, length, 1, (FILE *) handle);? ??? return 1;??? } static int _write_value_to_handle(const void *towrite, void *handle) { ??? fwrite(&towrite, sizeof(int), 1, handle);? //trie key values is an int ??? return 1; } /* Methods called when deserializing trie data */ static int _read_from_handle(void *wasread, const int length, void *handle) {? ???? fread(&wasread, length, 1, (FILE *) handle);? ??? return 1;??? } static void * _read_value_from_handle(void *handle)?? //returns a pointer to the value { ? //In my tests, this method never gets called, because a problem happens with the data ?// read in function _read_from_handle(). No error given, deserialize() function quits. ?? int *value; ??? fread(&value, sizeof(int), 1, handle);? ??? return value; } int trie_save(){?? ? FILE *out = fopen("savefile.txt", "wb"); ? int success=Trie_serialize(trie, _write_handle, _write_value_handle, (void *)out); ? fclose(out);??? ??? ?? ?return success; } int trie_load(){ ? int success=1; ? FILE *in = fopen("savefile.txt", "rb"); ? trie = Trie_deserialize(_read_handle, _read_value_handle,in); ? if(!trie) { ?? success=0; ?? printf("Error: trie could not be created from data.\n"); ? } ? fclose(in);??? ??? ?? ?return success; } ------------------------- Files in /Bio/: - trie.h? http://www.codase.com/search/showtext?file=L2dlbnRvbzIvdmFyL3RtcC9yZXBvcy9jb2Rhc2UuYy9iaW9weXRob24tMS40MGIvd29yay9iaW9weXRob24tMS40MGIvQmlvL3RyaWUuaA== - trie.c? http://www.codase.com/search/showtext?file=L2dlbnRvbzIvdmFyL3RtcC9yZXBvcy9jb2Rhc2UuYy9iaW9weXRob24tMS40MGIvd29yay9iaW9weXRob24tMS40MGIvQmlvL3RyaWUuYw== - triemodule.c (Interface of the trie API for Python. I looked into this file to see how to do serialization from C, but I can't understand it) http://www.codase.com/search/showtext?file=L2dlbnRvbzIvdmFyL3RtcC9yZXBvcy9jb2Rhc2UuYy9iaW9weXRob24tMS40MGIvd29yay9iaW9weXRob24tMS40MGIvQmlvL3RyaWVtb2R1bGUuYw== "Miles de personas buscan a alguien como t?. ?Quieres conocerlas? Las reglas del juego han cambiado." http://ad.doubleclick.net/clk;198970522;7829128;x?http://www.meetic.es/index.php?mtcmk=016799 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/octet-stream Size: 3054 bytes Desc: not available URL: From 2huggie at gmail.com Tue Apr 15 06:30:09 2008 From: 2huggie at gmail.com (Timothy Wu) Date: Tue, 15 Apr 2008 18:30:09 +0800 Subject: [BioPython] FeatureLocation Message-ID: Hi, I'm trying to access locations in a Genbank file like this: mRNA join(5172..5189,6254..6277,6744..6851,7266..7321, 8515..8749,10663..10826,11626..11807,12784..12879, 16701..17002,18265..18330,20541..20678,21188..21334, 23789..23873,24587..24653,26071..26197,31165..31341, 32922..33089,35124..35304,39185..39305,39545..39680, 40530..40749,40901..41045,43199..43355,44727..44926, 46662..46788,48405..48506,49040..49183,50001..50159, 51315..51481,54580..54774,56455..56746) I intend to get all the positions. However as it stands, it looks to me like only the starting and ending position is supported in the FeatureLocation class. Is there a way to obtain the coordinates? Timothy Timothy From peter at maubp.freeserve.co.uk Tue Apr 15 07:17:14 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Apr 2008 12:17:14 +0100 Subject: [BioPython] FeatureLocation In-Reply-To: References: Message-ID: <320fb6e00804150417u1e6620bdt1b649e7c1b74219b@mail.gmail.com> On Tue, Apr 15, 2008 at 11:30 AM, Timothy Wu <2huggie at gmail.com> wrote: > Hi, > I'm trying to access locations in a Genbank file like this: > > mRNA join(5172..5189,6254..6277,6744..6851,7266..7321, > 8515..8749,10663..10826,11626..11807,12784..12879, > 16701..17002,18265..18330,20541..20678,21188..21334, > 23789..23873,24587..24653,26071..26197,31165..31341, > 32922..33089,35124..35304,39185..39305,39545..39680, > 40530..40749,40901..41045,43199..43355,44727..44926, > 46662..46788,48405..48506,49040..49183,50001..50159, > 51315..51481,54580..54774,56455..56746) > > I intend to get all the positions. However as it stands, it looks to me like > only the starting and ending position is supported in the FeatureLocation > class. Is there a way to obtain the coordinates? They should be there as sub-features (a list property of the SeqFeature). Peter From bsantos at biocant.pt Wed Apr 16 09:34:02 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 16 Apr 2008 13:34:02 +0000 Subject: [BioPython] How to produce html report with biopython blast Message-ID: <20080416133402.1fc59140@mail.biocant.pt> Hi, I am using python to launch some BLAST searches, but instead of parsing results in Python I only want to produce a html file. To do that I am using the keyword html=?T?, but this don?t seem to be enough to python produce the file. Can anyone tell me how can I achieve my objective? Another question I want to ask is in which method of blast_record I can get the hsp.align-len, that is basically the length of the aligned hit? This value is present in the xml file but in the class diagram available in Biopython tutorial and cookbook I can?t see a method that returns this value. Thank you very much in advance, Bruno Santos P.S.-I am having lots of problems with sending messages to biopython lists it always say the message is awaiting approval because it contain a suspicious header. Can anyone tell me what I need to do to change this behavior? From biopython at maubp.freeserve.co.uk Wed Apr 16 09:55:11 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Apr 2008 14:55:11 +0100 Subject: [BioPython] How to produce html report with biopython blast In-Reply-To: <20080416133402.1fc59140@mail.biocant.pt> References: <20080416133402.1fc59140@mail.biocant.pt> Message-ID: <320fb6e00804160655t73e36092ka0497c12b5f2e2e8@mail.gmail.com> > Hi, > > I am using python to launch some BLAST searches, but instead of parsing results > in Python I only want to produce a html file. To do that I am using the keyword > html='T', but this don't seem to be enough to python produce the file. Can anyone > tell me how can I achieve my objective? Are you running standalone blast or using the NCBI online version? With Bio.Blast.NCBIWWW.qblast() use the argument format_type="HTML" With Bio.Blast.NCBIStandalone.blastall() I would try html=True (not literally a string "T", but I can see why you were mislead) and/or the align_view argument (see the stand alone blast documentation). > Another question I want to ask is in which method of blast_record I can get > the hsp.align-len, that is basically the length of the aligned hit? This value is > present in the xml file but in the class diagram available in Biopython tutorial > and cookbook I can't see a method that returns this value Probably you want hsp.align_length here. I'd have to double check that all three parsers are consistent here (for text, html and xml blast). > P.S.-I am having lots of problems with sending messages to biopython lists it > always say the message is awaiting approval because it contain a suspicious > header. Can anyone tell me what I need to do to change this behavior? I'm not sure. Are you sending emails from one email address via another account? Peter From p.j.a.cock at googlemail.com Wed Apr 16 10:20:29 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 Apr 2008 15:20:29 +0100 Subject: [BioPython] Any RPS-BLAST users out there? Message-ID: <320fb6e00804160720i75b172caqdd83cc905a8bf937@mail.gmail.com> Hello all, One of the things on my "to do list", is to incorporate some of the Biopython examples I wrote up on my webpage into the official Biopython documentation. For example, the following page using RPS-BLAST could make a nice addition to the "cook book" section of the main Biopython tutorial: http://www.warwick.ac.uk/go/peter_cock/python/rpsblast/ Does this look worth adding? Would anyone be interested in trying out that example on their own machine, and reporting back if it worked (with software version numbers) or any issues. Thanks, Peter From bsantos at biocant.pt Wed Apr 16 10:30:51 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 16 Apr 2008 15:30:51 +0100 Subject: [BioPython] How to produce html report with biopython blast In-Reply-To: <320fb6e00804160655t73e36092ka0497c12b5f2e2e8@mail.gmail.com> References: <20080416133402.1fc59140@mail.biocant.pt> <320fb6e00804160655t73e36092ka0497c12b5f2e2e8@mail.gmail.com> Message-ID: <002101c89fce$7a6cd010$6f467030$@pt> My problem was not changing the align_view argument. Now it is working fine. Thank you very much. Peter said: >Probably you want hsp.align_length here. I'd have to double check >that all three >parsers are consistent here (for text, html and xml blast). That's exactly what I want, the hsp.align_length is not present in Class Diagram for blast present on Biopython tutorial so it will be a good idea to add it. Since I usually do what Christof suggest to calculate the len(hsp.match) because I thought that information was not present in the xml, only when I write the xml to a file and checked the file I noticed it is present. I also have almost finished writing some documentation for Bio.Sequencing module to work with ace files I will probably be able to finish it during this week if I have time. So then I need to know how I can upload the file so it be included in the BioPython documentation. Peter said: >I'm not sure. Are you sending emails from one email address via >another account? Yes I always write the mails from this e-mail account so no idea why this is happening. Thank very much once more. Bruno Santos -----Mensagem original----- De: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] Em nome de Peter Enviada: quarta-feira, 16 de Abril de 2008 14:55 Para: Bruno Santos Cc: biopython at biopython.org Assunto: Re: [BioPython] How to produce html report with biopython blast > Hi, > > I am using python to launch some BLAST searches, but instead of parsing results > in Python I only want to produce a html file. To do that I am using the keyword > html='T', but this don't seem to be enough to python produce the file. Can anyone > tell me how can I achieve my objective? Are you running standalone blast or using the NCBI online version? With Bio.Blast.NCBIWWW.qblast() use the argument format_type="HTML" With Bio.Blast.NCBIStandalone.blastall() I would try html=True (not literally a string "T", but I can see why you were mislead) and/or the align_view argument (see the stand alone blast documentation). > Another question I want to ask is in which method of blast_record I can get > the hsp.align-len, that is basically the length of the aligned hit? This value is > present in the xml file but in the class diagram available in Biopython tutorial > and cookbook I can't see a method that returns this value Probably you want hsp.align_length here. I'd have to double check that all three parsers are consistent here (for text, html and xml blast). > P.S.-I am having lots of problems with sending messages to biopython lists it > always say the message is awaiting approval because it contain a suspicious > header. Can anyone tell me what I need to do to change this behavior? I'm not sure. Are you sending emails from one email address via another account? Peter From winter at biotec.tu-dresden.de Wed Apr 16 10:05:50 2008 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 16 Apr 2008 16:05:50 +0200 Subject: [BioPython] How to produce html report with biopython blast In-Reply-To: <20080416133402.1fc59140@mail.biocant.pt> References: <20080416133402.1fc59140@mail.biocant.pt> Message-ID: <480607BE.4060507@biotec.tu-dresden.de> Bruno Santos wrote: [...] > Another question I want to ask is in which method of blast_record I can get > the hsp.align-len, that is basically the length of the aligned hit? This > value is present in the xml file but in the class diagram available in > Biopython tutorial and cookbook I can?t see a method that returns this > value. You probably want: alignment.length = total length of unaligned hit sequence len(hsp.query) = len(hsp.match) = len(hsp.sbjct) = length of alignment using from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(open("my_blast.xml")) for blast_record in blast_records: for alignment in blast_record.alignments: for hsp in alignment.hsps: # do s.th. HTH, Christof > Thank you very much in advance, > > Bruno Santos > > > > P.S.-I am having lots of problems with sending messages to biopython lists > it always say the message is awaiting approval because it contain a > suspicious header. Can anyone tell me what I need to do to change this > behavior? _______________________________________________ BioPython mailing Sorry, no idea about that. Never happened to me. From mjldehoon at yahoo.com Wed Apr 16 11:17:43 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 16 Apr 2008 08:17:43 -0700 (PDT) Subject: [BioPython] Any RPS-BLAST users out there? In-Reply-To: <320fb6e00804160720i75b172caqdd83cc905a8bf937@mail.gmail.com> Message-ID: <823972.78108.qm@web62408.mail.re1.yahoo.com> Peter Cock wrote: > For example, the following page using RPS-BLAST could make a nice addition to > the "cook book" section of the main Biopython tutorial: > > http://www.warwick.ac.uk/go/peter_cock/python/rpsblast/ > > Does this look worth adding? Yes. However, I would prefer one Blast chapter in the tutorial, instead of a Blast chapter and also a cookbook section about Blast. --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From mjldehoon at yahoo.com Wed Apr 16 11:19:55 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 16 Apr 2008 08:19:55 -0700 (PDT) Subject: [BioPython] How to produce html report with biopython blast In-Reply-To: <002101c89fce$7a6cd010$6f467030$@pt> Message-ID: <412965.415.qm@web62404.mail.re1.yahoo.com> Bruno Santos wrote:I also have almost finished writing some documentation for Bio.Sequencing module to work with ace files I will probably be able to finish it during this week if I have time. So then I need to know how I can upload the file so it be included in the BioPython documentation. Create a bug report on Bugzilla, and add your documentation there as a patch. Thanks! ---Michiel --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From timmcilveen at talktalk.net Sat Apr 19 17:57:19 2008 From: timmcilveen at talktalk.net (Tim) Date: Sat, 19 Apr 2008 22:57:19 +0100 Subject: [BioPython] parser code Message-ID: <1208642239.5978.7.camel@linux-73yn.site> Hi, When I was learning Bioperl, I used a book, Beginning Perl for Bioinformatics. It gave a lot of information on how to write your own parsers etc. Is there anything similar for python at the minute? It is great that biopython has lots of parsers available, but it would be great to see the code for these, with some explanations. Does this info exists? All the best, Tim From sdavis2 at mail.nih.gov Sat Apr 19 20:10:03 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 19 Apr 2008 20:10:03 -0400 Subject: [BioPython] parser code In-Reply-To: <1208642239.5978.7.camel@linux-73yn.site> References: <1208642239.5978.7.camel@linux-73yn.site> Message-ID: <264855a00804191710k1de5535biffa0b192e89bd89f@mail.gmail.com> On Sat, Apr 19, 2008 at 5:57 PM, Tim wrote: > Hi, > When I was learning Bioperl, I used a book, Beginning Perl for > Bioinformatics. It gave a lot of information on how to write your own > parsers etc. Is there anything similar for python at the minute? It is > great that biopython has lots of parsers available, but it would be > great to see the code for these, with some explanations. Does this info > exists? Hello, Tim. Of course, the biopython project is open source, so you can just download the code and look at whatever you like. However, you might find this site useful: http://www.pasteur.fr/recherche/unites/sis/formation/python/ Sean From mjldehoon at yahoo.com Sun Apr 20 09:33:34 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 20 Apr 2008 06:33:34 -0700 (PDT) Subject: [BioPython] parser code In-Reply-To: <264855a00804191710k1de5535biffa0b192e89bd89f@mail.gmail.com> Message-ID: <438823.66033.qm@web62403.mail.re1.yahoo.com> I agree that that is an excellent site to learn Python and Biopython. One word of caution though: There may be some differences between the latest version of Biopython (1.45) and the Biopython that was used to write that tutorial. So when you try the examples shown there, you may see some deprecation warnings. --Michiel. Sean Davis wrote: On Sat, Apr 19, 2008 at 5:57 PM, Tim wrote: > Hi, > When I was learning Bioperl, I used a book, Beginning Perl for > Bioinformatics. It gave a lot of information on how to write your own > parsers etc. Is there anything similar for python at the minute? It is > great that biopython has lots of parsers available, but it would be > great to see the code for these, with some explanations. Does this info > exists? Hello, Tim. Of course, the biopython project is open source, so you can just download the code and look at whatever you like. However, you might find this site useful: http://www.pasteur.fr/recherche/unites/sis/formation/python/ Sean _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From timmcilveen at talktalk.net Mon Apr 21 16:39:38 2008 From: timmcilveen at talktalk.net (Tim) Date: Mon, 21 Apr 2008 21:39:38 +0100 Subject: [BioPython] parser code Message-ID: <480CFB8A.3050100@talktalk.net> Hi, Thanks for the replies. I do have the Pasteur python course and have been working through it, but thanks for the suggestion on that one, anyhow. The genedrift.org site link is exactly what I was looking for, being a direct port of the perl code in Beginning Perl for Bioinformatics. Thanks again for the suggestions, Tim From nuin at genedrift.org Mon Apr 21 16:59:13 2008 From: nuin at genedrift.org (Paulo Nuin) Date: Mon, 21 Apr 2008 16:59:13 -0400 Subject: [BioPython] parser code In-Reply-To: <480CFB8A.3050100@talktalk.net> References: <480CFB8A.3050100@talktalk.net> Message-ID: <480D0021.7020708@genedrift.org> Hi Tim I noticed that my reply didn't go to the list too. Anyway, there are some parts of the book that are not covered in the site, but if you need them covered just let me know. Regards Paulo Tim wrote: > Hi, > Thanks for the replies. I do have the Pasteur python course and have > been working through it, but thanks for the suggestion on that one, > anyhow. The genedrift.org site link is exactly what I was looking for, > being a direct port of the perl code in Beginning Perl for > Bioinformatics. > > Thanks again for the suggestions, > Tim > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From macrozhu at gmail.com Wed Apr 23 06:22:19 2008 From: macrozhu at gmail.com (Macro Zhu) Date: Wed, 23 Apr 2008 12:22:19 +0200 Subject: [BioPython] parse element symbols for ATOM/HETATM records Message-ID: <11b97ec0804230322w669aac8bx37fb92cf4e84240c@mail.gmail.com> Hi, the current Bio.PDB.PDBParser does not parse column 77-78 from ATOM records in PDB files, where element symbols are (usually) stored for ATOM. We suggest BioPython to parse this information in the next version. The reasons are given as follows: 1. The current remediated PDB format requires these symbols to be always present ( http://www.wwpdb.org/documentation/format3.1-20080211.pdf ), though in old PDB files (v2.3), these symbols are sometimes missing. 2. In some cases it is not straightforward, if not impossible, to recognize hydrogen atoms by their identifiers in the remediated PDB files. e.g. in 1AWW, ATOM 378 HD11 LEU A 25 46.755 -3.858 0.453 1.00 0.00 H ATOM 379 HD12 LEU A 25 47.178 -2.160 0.234 1.00 0.00 H ATOM 380 HD13 LEU A 25 47.054 -3.226 -1.165 1.00 0.00 H ATOM 381 HD21 LEU A 25 49.453 -1.483 0.307 1.00 0.00 H ATOM 382 HD22 LEU A 25 50.714 -2.537 -0.327 1.00 0.00 H ATOM 383 HD23 LEU A 25 49.413 -1.984 -1.381 1.00 0.00 H In this PDB entry, chemical symbols (H) are not right justified in column 13-14 for hydrogen identifiers like for other elements. A bit extra work is required to figure it out. What's more, sometimes it's even impossible to distinguish hydrogen from mercury without columns 77-78. From the PDB entry format description version 2.1: "Hydrogen naming sometimes conflicts with IUPAC conventions. For example, a hydrogen named HG11 in columns 13 - 16 is differentiated from a mercury atom by the element symbol in columns 77 - 78. Columns 13 - 16 present a unique name for each atom." Therefore we strongly suggest PDBParser to cover column 77-78 for ATOM/HETATM records. We have looked at relevant code and it seems three files (Atom.py, PDBParser.py, StructureBuilder.py) needed to be revised marginally for integrating this update: 1). in Atom.py CVS Revision 1.18 line 17: add one parameter "element" to the function Atom::__init__(...) def __init__(self, name, coord, bfactor, occupancy, altloc, fullname, serial_number, element): line 61: add line self.element = element add a set method: def set_element(self, element): self.element = element add a public method: def get_element(self): return self.element 2). in PDBParser.py CVS Revision 1.20 line 161: add one line to parse element symbol in function PDBParser::_parse_coordinates(self, coords_trailer) element=line[76:78].strip() line 182: add one more parameter to init_atom(): structure_builder.init_atom(name, coord, bfactor, occupancy, altloc, fullname, serial_number, element) 3). in StructureBuilder.py CVS Revision 1.16 line 158: add one parameter "element" to the function StructureBuilder::init_atom(self, name, coord, b_factor, occupancy, altloc, fullname, serial_number=None, element='') line 190: add "element" to the initialization of Atom instance. atom=self.atom=myAtom(name, coord, b_factor, occupancy, altloc, fullname, serial_number, element) regards, -- Hongbo Zhu From biopython at maubp.freeserve.co.uk Wed Apr 23 04:42:21 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 23 Apr 2008 09:42:21 +0100 Subject: [BioPython] Taxonomy in BioSQL Message-ID: <320fb6e00804230142i5aebe31al223af8fc73c51dd2@mail.gmail.com> Eric and I have been discussing how best to deal with missing or partial taxonomy information when importing sequences into a BioSQL database. I'm forwarding his email which he meant to send to the list... See also Bug 2475, http://bugzilla.open-bio.org/show_bug.cgi?id=2475 Just for background, a typical GenBank file includes an NCBI taxon ID plus the lineage as a list of names (strings). Ideally, the user will have run the BioSQL script load_ncbi_taxonomy.pl before hand, and their taxonomy tables will be fully populated and included the new sequence's NCBI taxon ID. See: http://biopython.org/wiki/BioSQL#NCBI_Taxonomy The big question is what to do when the new sequence being adding to BioSQL doesn't have an NCBI taxon ID (e.g. its from a non-NCBI sequence file), and we are reduced to string matching of species names... Peter ---------- Forwarded message ---------- From: Eric Gibert Date: Wed, Apr 23, 2008 at 2:33 AM Subject: Taxonomy in BioPython Dear all, With the help of Peter and Michiel, I am looking into improving the taxonomy management in BioPython [with BioSQL]. The first step is completed: an XML parser for the information from the NCBI taxonomy database is posted in the CVS for Bio.Entrez The second step is to improve the current coding of BioSQL/Loader.py for the part in charge of gathering/saving taxonomic data. For this point, we have 2 questions: In the case the user does not wish to access the NCBI server to fetch taxonomic data (or the user's server cannot access Internet) but some lineage information is present in the records to load, do you prefer to: 1) load ONLY the species in the BioSQL.taxon table (if INSERT is needed) - safe: main data are known - or 2) load the lineage information available, which most of the time means name and rank only (i.e. no NCBI id) - risky: not all data are known, different lineage might have different levels and looking by scientific name might create duplicates - Another related question: is there anyone accessing a taxonomic database different from the NCBI one? The persons using Loader.py or interested in doing so are welcome to comment and choose. Best regards, Eric __________________________________________________ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicit?s http://mail.yahoo.fr Yahoo! Mail From lueck at ipk-gatersleben.de Wed Apr 23 08:49:50 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Wed, 23 Apr 2008 14:49:50 +0200 Subject: [BioPython] blastn parameters Message-ID: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> Hello! Does anyone knows whether the parameters of NCBI's online blastn (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on) are the same, when I use blastall for a local blastn (without to change anything)? On the homepage I couldn't find the parameters. Thanks for help! Stefanie From sbassi at gmail.com Wed Apr 23 16:40:58 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 23 Apr 2008 17:40:58 -0300 Subject: [BioPython] blastn parameters In-Reply-To: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> Message-ID: On Wed, Apr 23, 2008 at 9:49 AM, Stefanie L?ck wrote: > Hello! > Does anyone knows whether the parameters of NCBI's online blastn (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on) > are the same, when I use blastall for a local blastn (without to change anything)? > > On the homepage I couldn't find the parameters. Just run the BLAST and the see the parameters at the end of the result page. But, now I see there is a place to see parameters, at the end of the page, look at this screenshot: http://bayimg.com/FaJnAAABH -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From cjfields at uiuc.edu Wed Apr 23 17:07:44 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 23 Apr 2008 16:07:44 -0500 Subject: [BioPython] blastn parameters In-Reply-To: References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> Message-ID: On Apr 23, 2008, at 3:40 PM, Sebastian Bassi wrote: > On Wed, Apr 23, 2008 at 9:49 AM, Stefanie L?ck > wrote: >> Hello! >> Does anyone knows whether the parameters of NCBI's online blastn (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on >> ) >> are the same, when I use blastall for a local blastn (without to >> change anything)? >> >> On the homepage I couldn't find the parameters. > > Just run the BLAST and the see the parameters at the end of the > result page. > But, now I see there is a place to see parameters, at the end of the > page, look at this screenshot: > http://bayimg.com/FaJnAAABH For remote BLAST one should look at the URLAPI parameters instead of referring to 'blastall' docs; there are some similarities but I think a few parameter defaults are different, just can't remember which ones (maybe eval cutoff, default result format display...) Here's the latest: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/ chris From bsouthey at gmail.com Wed Apr 23 17:31:34 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 23 Apr 2008 16:31:34 -0500 Subject: [BioPython] blastn parameters In-Reply-To: References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> Message-ID: <480FAAB6.9040604@gmail.com> Sebastian Bassi wrote: > On Wed, Apr 23, 2008 at 9:49 AM, Stefanie L?ck wrote: > >> Hello! >> Does anyone knows whether the parameters of NCBI's online blastn (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on) >> are the same, when I use blastall for a local blastn (without to change anything)? >> >> On the homepage I couldn't find the parameters. >> > > Just run the BLAST and the see the parameters at the end of the result page. > But, now I see there is a place to see parameters, at the end of the > page, look at this screenshot: > http://bayimg.com/FaJnAAABH > > > Hi, Well, it really depends... The NCBI site does track your preferences so if you modify an option it will use that value in the future. It may also depend on which blast version. From the NCBI FAQ, "match reward (-r) and mismatch penalty (-q) settings are now 2, -3 (from 1, -3)". This is different than blastall version 2.2.17 that I have (could be changed in 2.2.18). Word size appears to be another change. Bruce From lueck at ipk-gatersleben.de Thu Apr 24 08:03:58 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Thu, 24 Apr 2008 14:03:58 +0200 Subject: [BioPython] blastn parameters References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> <480FAAB6.9040604@gmail.com> Message-ID: <007001c8a603$471b18d0$1022a8c0@ipkgatersleben.de> Thanks for all the reply! Could someone tell me also how I can change the paramters over Biopyton? I this possible? Stefanie ----- Original Message ----- From: "Bruce Southey" To: "Sebastian Bassi" Cc: "Biopython List" Sent: Wednesday, April 23, 2008 11:31 PM Subject: Re: [BioPython] blastn parameters Sebastian Bassi wrote: > On Wed, Apr 23, 2008 at 9:49 AM, Stefanie L?ck > wrote: > >> Hello! >> Does anyone knows whether the parameters of NCBI's online blastn >> (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on) >> are the same, when I use blastall for a local blastn (without to change >> anything)? >> >> On the homepage I couldn't find the parameters. >> > > Just run the BLAST and the see the parameters at the end of the result > page. > But, now I see there is a place to see parameters, at the end of the > page, look at this screenshot: > http://bayimg.com/FaJnAAABH > > > Hi, Well, it really depends... The NCBI site does track your preferences so if you modify an option it will use that value in the future. It may also depend on which blast version. From the NCBI FAQ, "match reward (-r) and mismatch penalty (-q) settings are now 2, -3 (from 1, -3)". This is different than blastall version 2.2.17 that I have (could be changed in 2.2.18). Word size appears to be another change. Bruce _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Thu Apr 24 08:22:56 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 24 Apr 2008 13:22:56 +0100 Subject: [BioPython] blastn parameters In-Reply-To: <007001c8a603$471b18d0$1022a8c0@ipkgatersleben.de> References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> <480FAAB6.9040604@gmail.com> <007001c8a603$471b18d0$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00804240522i2c51f765s27c56049ff713878@mail.gmail.com> Stefanie L?ck wrote: > Thanks for all the reply! > Could someone tell me also how I can change the paramters over Biopyton? I > this possible? > > Stefanie You should be able to invoke any of the Blast options via Biopython. from Bio.Blast.NCBIStandalone import blastall help(blastall) from Bio.Blast.NCBIWWW import qblast help(qblast) Peter From mjldehoon at yahoo.com Fri Apr 25 20:53:17 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 25 Apr 2008 17:53:17 -0700 (PDT) Subject: [BioPython] parse element symbols for ATOM/HETATM records In-Reply-To: <11b97ec0804230322w669aac8bx37fb92cf4e84240c@mail.gmail.com> Message-ID: <763020.40003.qm@web62407.mail.re1.yahoo.com> Dear Hongbo Zhu, Could you open a bug report on BugZilla and append your patch there? Patches sent to the mailing list tend to get lost. --Michiel. Macro Zhu wrote: Hi, the current Bio.PDB.PDBParser does not parse column 77-78 from ATOM records in PDB files, where element symbols are (usually) stored for ATOM. We suggest BioPython to parse this information in the next version. The reasons are given as follows: 1. The current remediated PDB format requires these symbols to be always present ( http://www.wwpdb.org/documentation/format3.1-20080211.pdf ), though in old PDB files (v2.3), these symbols are sometimes missing. 2. In some cases it is not straightforward, if not impossible, to recognize hydrogen atoms by their identifiers in the remediated PDB files. e.g. in 1AWW, ATOM 378 HD11 LEU A 25 46.755 -3.858 0.453 1.00 0.00 H ATOM 379 HD12 LEU A 25 47.178 -2.160 0.234 1.00 0.00 H ATOM 380 HD13 LEU A 25 47.054 -3.226 -1.165 1.00 0.00 H ATOM 381 HD21 LEU A 25 49.453 -1.483 0.307 1.00 0.00 H ATOM 382 HD22 LEU A 25 50.714 -2.537 -0.327 1.00 0.00 H ATOM 383 HD23 LEU A 25 49.413 -1.984 -1.381 1.00 0.00 H In this PDB entry, chemical symbols (H) are not right justified in column 13-14 for hydrogen identifiers like for other elements. A bit extra work is required to figure it out. What's more, sometimes it's even impossible to distinguish hydrogen from mercury without columns 77-78. From the PDB entry format description version 2.1: "Hydrogen naming sometimes conflicts with IUPAC conventions. For example, a hydrogen named HG11 in columns 13 - 16 is differentiated from a mercury atom by the element symbol in columns 77 - 78. Columns 13 - 16 present a unique name for each atom." Therefore we strongly suggest PDBParser to cover column 77-78 for ATOM/HETATM records. We have looked at relevant code and it seems three files (Atom.py, PDBParser.py, StructureBuilder.py) needed to be revised marginally for integrating this update: 1). in Atom.py CVS Revision 1.18 line 17: add one parameter "element" to the function Atom::__init__(...) def __init__(self, name, coord, bfactor, occupancy, altloc, fullname, serial_number, element): line 61: add line self.element = element add a set method: def set_element(self, element): self.element = element add a public method: def get_element(self): return self.element 2). in PDBParser.py CVS Revision 1.20 line 161: add one line to parse element symbol in function PDBParser::_parse_coordinates(self, coords_trailer) element=line[76:78].strip() line 182: add one more parameter to init_atom(): structure_builder.init_atom(name, coord, bfactor, occupancy, altloc, fullname, serial_number, element) 3). in StructureBuilder.py CVS Revision 1.16 line 158: add one parameter "element" to the function StructureBuilder::init_atom(self, name, coord, b_factor, occupancy, altloc, fullname, serial_number=None, element='') line 190: add "element" to the initialization of Atom instance. atom=self.atom=myAtom(name, coord, b_factor, occupancy, altloc, fullname, serial_number, element) regards, -- Hongbo Zhu _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From timmcilveen at talktalk.net Tue Apr 8 16:36:37 2008 From: timmcilveen at talktalk.net (Tim) Date: Tue, 08 Apr 2008 17:36:37 +0100 Subject: [BioPython] Python 3 Message-ID: <47FB9F15.2050304@talktalk.net> Hi, I have been learning python and biopython side by side for a while now but notice that the release of python 3 is now approaching with many new features. Will biopython in its current form be easy to port to python 3, or is this going to be a major project ? Thanks, Tim From sbassi at gmail.com Tue Apr 8 16:54:50 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Apr 2008 13:54:50 -0300 Subject: [BioPython] Python 3 In-Reply-To: <47FB9F15.2050304@talktalk.net> References: <47FB9F15.2050304@talktalk.net> Message-ID: On Tue, Apr 8, 2008 at 1:36 PM, Tim wrote: > I have been learning python and biopython side by side for a while now but > notice that the release of python 3 is now approaching with many new > features. Will biopython in its current form be easy to port to python 3, or > is this going to be a major project ? Biopython depends on some other packages. After these packages are migrated, there will be no Biopython for Python 3. Most external packages won't be ready by the release of Python 3. Just after the release of Python 3.0, most external modules will start migrating. Since this is a future event take what I say as a speculation :) -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From biopython at maubp.freeserve.co.uk Tue Apr 8 17:14:36 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Apr 2008 18:14:36 +0100 Subject: [BioPython] Python 3 In-Reply-To: <47FB9F15.2050304@talktalk.net> References: <47FB9F15.2050304@talktalk.net> Message-ID: <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> On Tue, Apr 8, 2008 at 5:36 PM, Tim wrote: > Hi, > I have been learning python and biopython side by side for a while now but > notice that the release of python 3 is now approaching with many new > features. Will biopython in its current form be easy to port to python 3, or > is this going to be a major project ? The short answer is I don't know, and as far as I know, no one has tried yet. As Sebastian points out, Biopython does depend on several third party modules for some of its functionality - and they too would have to be ported to Python 3. Peter From idoerg at gmail.com Tue Apr 8 17:29:53 2008 From: idoerg at gmail.com (Iddo Friedberg) Date: Tue, 8 Apr 2008 10:29:53 -0700 Subject: [BioPython] Python 3 In-Reply-To: <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> References: <47FB9F15.2050304@talktalk.net> <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> Message-ID: Python 3.x is almost a new language. Thus I expect that if there will be a Biopython 3.x, it will be developed from the ground up, by enterprising souls woho would like to see that happen. But Python 2.x will still be around for a while. The powers-that-be in Python are trying to make the whole process of 3.0 porting as gentle as possible. Thus Python 2.x is not going away anytime soon and 2.x versions will be around for a while. Also, 2.6 will be in a sense forward compatible. >From the horse's mouth: " Python 2.6 will support forward compatibility in the following two ways: - It will support a "Py3k warnings mode" which will warn dynamically (i.e. at runtime) about features that will stop working in Python 3.0, e.g. assuming that range() returns a list. - It will contain backported versions of many Py3k features, either enabled through __future__ statements or simply by allowing old and new syntax to be used side-by-side (if the new syntax would be a syntax error in 2.x). " On Tue, Apr 8, 2008 at 10:14 AM, Peter wrote: > On Tue, Apr 8, 2008 at 5:36 PM, Tim wrote: > > Hi, > > I have been learning python and biopython side by side for a while now > but > > notice that the release of python 3 is now approaching with many new > > features. Will biopython in its current form be easy to port to python > 3, or > > is this going to be a major project ? > > The short answer is I don't know, and as far as I know, no one has tried > yet. > > As Sebastian points out, Biopython does depend on several third party > modules for some of its functionality - and they too would have to be > ported to Python 3. > > Peter > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- Iddo Friedberg, Ph.D. CALIT2, mail code 0440 University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0440, USA T: +1 (858) 534-0570 T: +1 (858) 646-3100 x3516 http://iddo-friedberg.org From cjfields at uiuc.edu Tue Apr 8 17:52:16 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 Apr 2008 12:52:16 -0500 Subject: [BioPython] Python 3 In-Reply-To: References: <47FB9F15.2050304@talktalk.net> <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> Message-ID: <2D7C07AD-DEB9-4866-BD5B-904D5510F8B7@uiuc.edu> This is similar to what the bioperl devs think will eventually happen with bioperl and perl 6 (which, like python 3.0, is pretty much a new language). Not much one can do about it, but it does give u the opportunity to rethink implementations. In the meantime, bioperl/ perl5 will be around for a while. chris On Apr 8, 2008, at 12:29 PM, Iddo Friedberg wrote: > Python 3.x is almost a new language. Thus I expect that if there > will be a > Biopython 3.x, it will be developed from the ground up, by > enterprising > souls woho would like to see that happen. But Python 2.x will still be > around for a while. > > The powers-that-be in Python are trying to make the whole process of > 3.0 > porting as gentle as possible. Thus Python 2.x is not going away > anytime > soon and 2.x versions will be around for a while. Also, 2.6 will be > in a > sense forward compatible. > >> From the horse's mouth: > " > > Python 2.6 will support forward compatibility in the following two > ways: > > - It will support a "Py3k warnings mode" which will warn dynamically > (i.e. at runtime) about features that will stop working in Python > 3.0, e.g. > assuming that range() returns a list. > - It will contain backported versions of many Py3k features, either > enabled through __future__ statements or simply by allowing old > and new > syntax to be used side-by-side (if the new syntax would be a > syntax error in > 2.x). > > " > > > On Tue, Apr 8, 2008 at 10:14 AM, Peter > > wrote: > >> On Tue, Apr 8, 2008 at 5:36 PM, Tim wrote: >>> Hi, >>> I have been learning python and biopython side by side for a while >>> now >> but >>> notice that the release of python 3 is now approaching with many new >>> features. Will biopython in its current form be easy to port to >>> python >> 3, or >>> is this going to be a major project ? >> >> The short answer is I don't know, and as far as I know, no one has >> tried >> yet. >> >> As Sebastian points out, Biopython does depend on several third party >> modules for some of its functionality - and they too would have to be >> ported to Python 3. >> >> Peter >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > > > > -- > > Iddo Friedberg, Ph.D. > CALIT2, mail code 0440 > University of California, San Diego > 9500 Gilman Drive > La Jolla, CA 92093-0440, USA > T: +1 (858) 534-0570 > T: +1 (858) 646-3100 x3516 > http://iddo-friedberg.org > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From timmcilveen at talktalk.net Tue Apr 8 18:43:33 2008 From: timmcilveen at talktalk.net (Tim) Date: Tue, 08 Apr 2008 19:43:33 +0100 Subject: [BioPython] Python 3 In-Reply-To: <47FB9F15.2050304@talktalk.net> References: <47FB9F15.2050304@talktalk.net> Message-ID: <47FBBCD5.3060904@talktalk.net> Thanks for the replies everyone. I have been programming perl for quite a while, but have been in the process of learning python, for all of the nice OO features. I knew that python 3 was on the horizon, but I thought that this would be in a Perl 6 kind of way - i.e. sometime in the very distant future :-) I didn't realise that it would be this summer. Thanks again, Tim > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From cjfields at uiuc.edu Tue Apr 8 19:04:17 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 8 Apr 2008 14:04:17 -0500 Subject: [BioPython] Python 3 In-Reply-To: <47FBBCD5.3060904@talktalk.net> References: <47FB9F15.2050304@talktalk.net> <47FBBCD5.3060904@talktalk.net> Message-ID: <49D0CCCE-04E6-444A-B5C6-FB62A2EDDF88@uiuc.edu> Perl 6 is supposed to arrive at Christmas. Now which Christmas is anyone's guess. One can play around with two pre-alpha Perl 6 implementations, Parrot and Pugs, in the meantime (something I have been doing a bit lately...) chris On Apr 8, 2008, at 1:43 PM, Tim wrote: > Thanks for the replies everyone. I have been programming perl for > quite a while, but have been in the process of learning python, for > all of the nice OO features. I knew that python 3 was on the > horizon, but I thought that this would be in a Perl 6 kind of way - > i.e. sometime in the very distant future :-) I didn't realise that > it would be this summer. > > Thanks again, > Tim > >> >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython >> > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From alaguraj.v at gmail.com Tue Apr 8 19:29:52 2008 From: alaguraj.v at gmail.com (Alaguraj Veluchamy) Date: Tue, 8 Apr 2008 12:29:52 -0700 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN Message-ID: Dear people,I am stuck in parsing Blast XML output. I am trying to extract the co-ordinate and hence the sequence of the hits for my query in standalone blast. XML output is coming for query proteins against a genbank genome file. (using TBLASTN) The problem for me is- I am unable to get the co-ordinate(hit-from --> hit-to). I used hsp.sbjct_start and hsp.sbct_end, but it gives two numbers which is different from the original. when i tried extracting those regions from the genbank file and do an alignment, it nowhere matches. so what is this hsp.sbjct_start - integers?? how to get the hit region (or alignment) cordinates? I read record.py--the library file, but they given it is 3x times larger (but still it is incorrect) Regards -Alaguraj.V From sbassi at gmail.com Tue Apr 8 19:36:59 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 8 Apr 2008 16:36:59 -0300 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN In-Reply-To: References: Message-ID: On Tue, Apr 8, 2008 at 4:29 PM, Alaguraj Veluchamy wrote: > Dear people,I am stuck in parsing Blast XML output. Give me your XML output file and your code to see the problem. Maybe you can't align because the is a difference in polarity between the BLAST hit and the original sequence (look at the 1 and -1 in Hsp_query-frame and Hsp_hit-frame tags, does your problem happends when you have 1 and -1 or even when you have both 1?). You should give us a more detailed report, including xml file and what are you expecting and what are you getting. If the XML file is too big, upload it to another place and tell us the URL to download it. Best, SB. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From biopython at maubp.freeserve.co.uk Tue Apr 8 19:45:27 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Apr 2008 20:45:27 +0100 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN In-Reply-To: References: Message-ID: <320fb6e00804081245s550c600cu5a7d420933b9777e@mail.gmail.com> Hi Alaguraj, Welcome to the Biopython mailing list. On Tue, Apr 8, 2008 at 8:36 PM, Sebastian Bassi wrote: > On Tue, Apr 8, 2008 at 4:29 PM, Alaguraj Veluchamy wrote: > > Dear people,I am stuck in parsing Blast XML output. > > Give me your XML output file and your code to see the problem. > Maybe you can't align because the is a difference in polarity between > the BLAST hit and the original sequence (look at the 1 and -1 in > Hsp_query-frame and Hsp_hit-frame tags, does your problem happends > when you have 1 and -1 or even when you have both 1?). Maybe - but I suspect the problem is you are looking at the subject coordinates (hsp.sbjct_start and hsp.sbjct_end - these refer to the matched sequence in the database), and not the search query coordinates (hsp.query_start and hsp.query_end - these refer to your input query sequence). > You should give us a more detailed report, including xml file and what > are you expecting and what are you getting. > If the XML file is too big, upload it to another place and tell us the > URL to download it. Or better yet, if you are sure this is a bug, file it on Bugzilla. And then attach the XML file to the bug. http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython Peter From nuin at genedrift.org Tue Apr 8 19:17:39 2008 From: nuin at genedrift.org (Paulo Nuin) Date: Tue, 08 Apr 2008 15:17:39 -0400 Subject: [BioPython] Python 3 In-Reply-To: <49D0CCCE-04E6-444A-B5C6-FB62A2EDDF88@uiuc.edu> References: <47FB9F15.2050304@talktalk.net> <47FBBCD5.3060904@talktalk.net> <49D0CCCE-04E6-444A-B5C6-FB62A2EDDF88@uiuc.edu> Message-ID: <47FBC4D3.4090201@genedrift.org> Just after Duke Nukem Forever ... Chris Fields wrote: > Perl 6 is supposed to arrive at Christmas. Now which Christmas is > anyone's guess. > > One can play around with two pre-alpha Perl 6 implementations, Parrot > and Pugs, in the meantime (something I have been doing a bit lately...) > > chris > > On Apr 8, 2008, at 1:43 PM, Tim wrote: > >> Thanks for the replies everyone. I have been programming perl for >> quite a while, but have been in the process of learning python, for >> all of the nice OO features. I knew that python 3 was on the horizon, >> but I thought that this would be in a Perl 6 kind of way - i.e. >> sometime in the very distant future :-) I didn't realise that it >> would be this summer. >> >> Thanks again, >> Tim >> >>> >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >>> >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From kteague at bcgsc.ca Tue Apr 8 19:35:36 2008 From: kteague at bcgsc.ca (Kevin Teague) Date: Tue, 8 Apr 2008 12:35:36 -0700 Subject: [BioPython] Python 3 In-Reply-To: <2D7C07AD-DEB9-4866-BD5B-904D5510F8B7@uiuc.edu> References: <47FB9F15.2050304@talktalk.net> <320fb6e00804081014q16c1ce42n4b3245631a7e6249@mail.gmail.com> <2D7C07AD-DEB9-4866-BD5B-904D5510F8B7@uiuc.edu> Message-ID: > > This is similar to what the bioperl devs think will eventually > happen with bioperl and perl 6 (which, like python 3.0, is pretty > much a new language). Not much one can do about it, but it does > give u the opportunity to rethink implementations. In the > meantime, bioperl/perl5 will be around for a while. Python 3 and Perl 6 are similar in that they break backwards compatability, but beyond that Perl 6 is a complete rethinking and reimplementation of the entire language. Python 3 is much smaller in scope, it's primary goal is to clean-up early design warts in the langauge. The only syntactically significant change is the replacement of the print statement with a print() function. Most of the other changes are fairly under-the-hood. Only one string type (unicode) instead of two (unicode and ascii), only one integer type (int) instead of two (ing and longint), etc.: http://docs.python.org/dev/3.0/whatsnew/3.0.html > > On Apr 8, 2008, at 12:29 PM, Iddo Friedberg wrote: > >> Python 3.x is almost a new language. Thus I expect that if there >> will be a >> Biopython 3.x, it will be developed from the ground up, by >> enterprising >> souls woho would like to see that happen. But Python 2.x will >> still be >> around for a while. >> Note that Guido van Rossum and others have been encouraging projects to port to Python 3 *without* changing the APIs of the project. The goal is to make it possible to support a Python 3 port from a Python 2 source tree using the 2to3 tool. API changes are encourged to be made in Python 2, before porting (http://www.artima.com/weblogs/ viewpost.jsp?thread=227041). >> The powers-that-be in Python are trying to make the whole process >> of 3.0 >> porting as gentle as possible. Thus Python 2.x is not going away >> anytime >> soon and 2.x versions will be around for a while. Also, 2.6 will >> be in a >> sense forward compatible. >> Yes, Python 2 will not be going away anytime soon. The changes between Python 2 and 3 aren't *that* major either, so for people trying to just get work done, there is really no hurry to get to Python 3. Python 3.0 will also be a bit of an early adopters release, it's expected that a 3.1 and perhaps 3.2 release will follow in fairly short order after Python 3, which is probably when larger projects with library dependencies such as BioPython will more realistically consider a port. I would guess it would be another 3 or 4 years from now before there are a significant amount of useful libraries that are only being maintained for Python 3 that one will have a pragmatic motiviation for porting. From 2huggie at gmail.com Wed Apr 9 13:56:57 2008 From: 2huggie at gmail.com (Timothy Wu) Date: Wed, 9 Apr 2008 21:56:57 +0800 Subject: [BioPython] Ensembl Message-ID: Hi there, I am tryign to access data from Ensembl. I noticed that Ensembl has a Perl API. Any support for Ensembl here at BioPython? Timothy From alaguraj.v at gmail.com Sat Apr 12 21:23:26 2008 From: alaguraj.v at gmail.com (Alaguraj Veluchamy) Date: Sat, 12 Apr 2008 14:23:26 -0700 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN Message-ID: Dear people, I thought this will give a solution to the problem, i faced. If you do "formatdb" of blast with ".gbk" files of NCBI database, then the sequence co-ordinates are processed and gives numbers which doesnt match with the original sequence. Record.Py in the lib module doesnt say much about this. The solution is- do format the blast database with ".fna". This gives out a result with correct co-ordinates. This co-ordinate can be used to retrieve the sequence or can be used to parse the ".gbk" files. To make it precise, Format with "fna" files and parse with "gbk" files. This is true for all versions of BLAST. As two members have asked for the code.. here it is: ########################## import os import sys import dircache import re from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML path="/Users/alaguraj/soca/genomes/Bacteria" lstDb=os.listdir(path) blastdbpath="/Users/alaguraj/blastSmVsall/blastDB" myArray=[] for db in lstDb: eachdirpath=path+"/"+db listfiles=dircache.listdir(eachdirpath) print listfiles for listfile in listfiles: command='/Users/alaguraj/progs/blast-2.2.17/bin/blastall -i '+inputfile+' -p tblastn -e 0.001 -d '+blastdbpath+'/'+db+'/'+listfile+' -m 7 \n' w, read, e = os.popen3(command) b_parser = NCBIXML.BlastParser() blast_records = b_parser.parse(read) for b_record in blast_records: myArray.append(b_record) ########I am appending here because i need the data to further process for my application######### for eachmyArray in myArray: for alignment in eachmyArray.alignments: for hsp in alignment.hsps: print hsp.sbjct_start print hsp.sbjct_end ############################################################ The next code is For parsing the genbank file parser=GenBank.RecordParser() record = parser.parse(open(genomefile)) for feature in record.features: if feature.key=="CDS": for qualifier in feature.qualifiers: if qualifier.key=="/protein_id=": protid=qualifier.value[1:-1] print protid if qualifier.key=="/product=": func=qualifier.value[1:-1] print protid ############################################################# Sorry for the long mail. Hope it helps. Regards, Alaguraj.V Dear people,I am stuck in parsing Blast XML output. I am trying to extract the co-ordinate and hence the sequence of the hits for my query in standalone blast. XML output is coming for query proteins against a genbank genome file. (using TBLASTN) The problem for me is- I am unable to get the co-ordinate(hit-from --> hit-to). I used hsp.sbjct_start and hsp.sbct_end, but it gives two numbers which is different from the original. when i tried extracting those regions from the genbank file and do an alignment, it nowhere matches. so what is this hsp.sbjct_start - integers?? how to get the hit region (or alignment) cordinates? I read record.py--the library file, but they given it is 3x times larger (but still it is incorrect) Regards -Alaguraj.V From sbassi at gmail.com Sun Apr 13 05:49:44 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Sun, 13 Apr 2008 02:49:44 -0300 Subject: [BioPython] NCBIXML Blast parser Error for TBLASTN In-Reply-To: References: Message-ID: On Sat, Apr 12, 2008 at 6:23 PM, Alaguraj Veluchamy wrote: > If you do "formatdb" of blast with ".gbk" files of NCBI database, then > the sequence co-ordinates are processed and gives numbers which doesnt > match with the original sequence. > Record.Py in the lib module doesnt say much about this. Formatdb works with fasta or ans1 as input. I think this issue is not a problem of the parser. The parser reads whatever is in the BLAST output file. You should take a look on how did you generate it. -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From vsla at wanadoo.es Sun Apr 13 20:43:44 2008 From: vsla at wanadoo.es (Virginia S. Lopez) Date: Sun, 13 Apr 2008 22:43:44 +0200 Subject: [BioPython] Trie API for serializing a trie Message-ID: <200804132128.m3DLSD5M021030@portal.open-bio.org> Hi all, I'd like to use the trie in Bio/trie.h from other C program (which I access using python). I need to implement a save() and load() methods, that will call serialize() and deserialize() in Bio/trie.c. I'm having problems to get this working, it compiles fine, and I can serialize data to disk, but I can't deserialize data to create a trie from the file. I don't know wheter the problem is in how I write trie's data to disk, or when I load it from disk, or both. I'm new to C and I possibly do a silly mistake. I'm copying some of my code below, and attach the full code. It takes one argument that is a name of a txt file containing the keys (one per line), and builds the trie, and saves it to file "savefile.txt" in binary. (To test to load the trie from the saved file, I simply comment a code block in main().) I'd massively appreciate any suggestions about what I'm doing wrong. Many thanks. Virginia. --------------------- /* Methods called when serializing trie data */ static int _write_to_handle(const void *towrite, const int length, void *handle) { ??? fwrite(&towrite, length, 1, (FILE *) handle);? ??? return 1;??? } static int _write_value_to_handle(const void *towrite, void *handle) { ??? fwrite(&towrite, sizeof(int), 1, handle);? //trie key values is an int ??? return 1; } /* Methods called when deserializing trie data */ static int _read_from_handle(void *wasread, const int length, void *handle) {? ???? fread(&wasread, length, 1, (FILE *) handle);? ??? return 1;??? } static void * _read_value_from_handle(void *handle)?? //returns a pointer to the value { ? //In my tests, this method never gets called, because a problem happens with the data ?// read in function _read_from_handle(). No error given, deserialize() function quits. ?? int *value; ??? fread(&value, sizeof(int), 1, handle);? ??? return value; } int trie_save(){?? ? FILE *out = fopen("savefile.txt", "wb"); ? int success=Trie_serialize(trie, _write_handle, _write_value_handle, (void *)out); ? fclose(out);??? ??? ?? ?return success; } int trie_load(){ ? int success=1; ? FILE *in = fopen("savefile.txt", "rb"); ? trie = Trie_deserialize(_read_handle, _read_value_handle,in); ? if(!trie) { ?? success=0; ?? printf("Error: trie could not be created from data.\n"); ? } ? fclose(in);??? ??? ?? ?return success; } ------------------------- Files in /Bio/: - trie.h? http://www.codase.com/search/showtext?file=L2dlbnRvbzIvdmFyL3RtcC9yZXBvcy9jb2Rhc2UuYy9iaW9weXRob24tMS40MGIvd29yay9iaW9weXRob24tMS40MGIvQmlvL3RyaWUuaA== - trie.c? http://www.codase.com/search/showtext?file=L2dlbnRvbzIvdmFyL3RtcC9yZXBvcy9jb2Rhc2UuYy9iaW9weXRob24tMS40MGIvd29yay9iaW9weXRob24tMS40MGIvQmlvL3RyaWUuYw== - triemodule.c (Interface of the trie API for Python. I looked into this file to see how to do serialization from C, but I can't understand it) http://www.codase.com/search/showtext?file=L2dlbnRvbzIvdmFyL3RtcC9yZXBvcy9jb2Rhc2UuYy9iaW9weXRob24tMS40MGIvd29yay9iaW9weXRob24tMS40MGIvQmlvL3RyaWVtb2R1bGUuYw== "Miles de personas buscan a alguien como t?. ?Quieres conocerlas? Las reglas del juego han cambiado." http://ad.doubleclick.net/clk;198970522;7829128;x?http://www.meetic.es/index.php?mtcmk=016799 -------------- next part -------------- A non-text attachment was scrubbed... Name: trie_test.c Type: application/octet-stream Size: 3054 bytes Desc: not available URL: From 2huggie at gmail.com Tue Apr 15 10:30:09 2008 From: 2huggie at gmail.com (Timothy Wu) Date: Tue, 15 Apr 2008 18:30:09 +0800 Subject: [BioPython] FeatureLocation Message-ID: Hi, I'm trying to access locations in a Genbank file like this: mRNA join(5172..5189,6254..6277,6744..6851,7266..7321, 8515..8749,10663..10826,11626..11807,12784..12879, 16701..17002,18265..18330,20541..20678,21188..21334, 23789..23873,24587..24653,26071..26197,31165..31341, 32922..33089,35124..35304,39185..39305,39545..39680, 40530..40749,40901..41045,43199..43355,44727..44926, 46662..46788,48405..48506,49040..49183,50001..50159, 51315..51481,54580..54774,56455..56746) I intend to get all the positions. However as it stands, it looks to me like only the starting and ending position is supported in the FeatureLocation class. Is there a way to obtain the coordinates? Timothy Timothy From peter at maubp.freeserve.co.uk Tue Apr 15 11:17:14 2008 From: peter at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Apr 2008 12:17:14 +0100 Subject: [BioPython] FeatureLocation In-Reply-To: References: Message-ID: <320fb6e00804150417u1e6620bdt1b649e7c1b74219b@mail.gmail.com> On Tue, Apr 15, 2008 at 11:30 AM, Timothy Wu <2huggie at gmail.com> wrote: > Hi, > I'm trying to access locations in a Genbank file like this: > > mRNA join(5172..5189,6254..6277,6744..6851,7266..7321, > 8515..8749,10663..10826,11626..11807,12784..12879, > 16701..17002,18265..18330,20541..20678,21188..21334, > 23789..23873,24587..24653,26071..26197,31165..31341, > 32922..33089,35124..35304,39185..39305,39545..39680, > 40530..40749,40901..41045,43199..43355,44727..44926, > 46662..46788,48405..48506,49040..49183,50001..50159, > 51315..51481,54580..54774,56455..56746) > > I intend to get all the positions. However as it stands, it looks to me like > only the starting and ending position is supported in the FeatureLocation > class. Is there a way to obtain the coordinates? They should be there as sub-features (a list property of the SeqFeature). Peter From bsantos at biocant.pt Wed Apr 16 13:34:02 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 16 Apr 2008 13:34:02 +0000 Subject: [BioPython] How to produce html report with biopython blast Message-ID: <20080416133402.1fc59140@mail.biocant.pt> Hi, I am using python to launch some BLAST searches, but instead of parsing results in Python I only want to produce a html file. To do that I am using the keyword html=?T?, but this don?t seem to be enough to python produce the file. Can anyone tell me how can I achieve my objective? Another question I want to ask is in which method of blast_record I can get the hsp.align-len, that is basically the length of the aligned hit? This value is present in the xml file but in the class diagram available in Biopython tutorial and cookbook I can?t see a method that returns this value. Thank you very much in advance, Bruno Santos P.S.-I am having lots of problems with sending messages to biopython lists it always say the message is awaiting approval because it contain a suspicious header. Can anyone tell me what I need to do to change this behavior? From biopython at maubp.freeserve.co.uk Wed Apr 16 13:55:11 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Apr 2008 14:55:11 +0100 Subject: [BioPython] How to produce html report with biopython blast In-Reply-To: <20080416133402.1fc59140@mail.biocant.pt> References: <20080416133402.1fc59140@mail.biocant.pt> Message-ID: <320fb6e00804160655t73e36092ka0497c12b5f2e2e8@mail.gmail.com> > Hi, > > I am using python to launch some BLAST searches, but instead of parsing results > in Python I only want to produce a html file. To do that I am using the keyword > html='T', but this don't seem to be enough to python produce the file. Can anyone > tell me how can I achieve my objective? Are you running standalone blast or using the NCBI online version? With Bio.Blast.NCBIWWW.qblast() use the argument format_type="HTML" With Bio.Blast.NCBIStandalone.blastall() I would try html=True (not literally a string "T", but I can see why you were mislead) and/or the align_view argument (see the stand alone blast documentation). > Another question I want to ask is in which method of blast_record I can get > the hsp.align-len, that is basically the length of the aligned hit? This value is > present in the xml file but in the class diagram available in Biopython tutorial > and cookbook I can't see a method that returns this value Probably you want hsp.align_length here. I'd have to double check that all three parsers are consistent here (for text, html and xml blast). > P.S.-I am having lots of problems with sending messages to biopython lists it > always say the message is awaiting approval because it contain a suspicious > header. Can anyone tell me what I need to do to change this behavior? I'm not sure. Are you sending emails from one email address via another account? Peter From p.j.a.cock at googlemail.com Wed Apr 16 14:20:29 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 16 Apr 2008 15:20:29 +0100 Subject: [BioPython] Any RPS-BLAST users out there? Message-ID: <320fb6e00804160720i75b172caqdd83cc905a8bf937@mail.gmail.com> Hello all, One of the things on my "to do list", is to incorporate some of the Biopython examples I wrote up on my webpage into the official Biopython documentation. For example, the following page using RPS-BLAST could make a nice addition to the "cook book" section of the main Biopython tutorial: http://www.warwick.ac.uk/go/peter_cock/python/rpsblast/ Does this look worth adding? Would anyone be interested in trying out that example on their own machine, and reporting back if it worked (with software version numbers) or any issues. Thanks, Peter From bsantos at biocant.pt Wed Apr 16 14:30:51 2008 From: bsantos at biocant.pt (Bruno Santos) Date: Wed, 16 Apr 2008 15:30:51 +0100 Subject: [BioPython] How to produce html report with biopython blast In-Reply-To: <320fb6e00804160655t73e36092ka0497c12b5f2e2e8@mail.gmail.com> References: <20080416133402.1fc59140@mail.biocant.pt> <320fb6e00804160655t73e36092ka0497c12b5f2e2e8@mail.gmail.com> Message-ID: <002101c89fce$7a6cd010$6f467030$@pt> My problem was not changing the align_view argument. Now it is working fine. Thank you very much. Peter said: >Probably you want hsp.align_length here. I'd have to double check >that all three >parsers are consistent here (for text, html and xml blast). That's exactly what I want, the hsp.align_length is not present in Class Diagram for blast present on Biopython tutorial so it will be a good idea to add it. Since I usually do what Christof suggest to calculate the len(hsp.match) because I thought that information was not present in the xml, only when I write the xml to a file and checked the file I noticed it is present. I also have almost finished writing some documentation for Bio.Sequencing module to work with ace files I will probably be able to finish it during this week if I have time. So then I need to know how I can upload the file so it be included in the BioPython documentation. Peter said: >I'm not sure. Are you sending emails from one email address via >another account? Yes I always write the mails from this e-mail account so no idea why this is happening. Thank very much once more. Bruno Santos -----Mensagem original----- De: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] Em nome de Peter Enviada: quarta-feira, 16 de Abril de 2008 14:55 Para: Bruno Santos Cc: biopython at biopython.org Assunto: Re: [BioPython] How to produce html report with biopython blast > Hi, > > I am using python to launch some BLAST searches, but instead of parsing results > in Python I only want to produce a html file. To do that I am using the keyword > html='T', but this don't seem to be enough to python produce the file. Can anyone > tell me how can I achieve my objective? Are you running standalone blast or using the NCBI online version? With Bio.Blast.NCBIWWW.qblast() use the argument format_type="HTML" With Bio.Blast.NCBIStandalone.blastall() I would try html=True (not literally a string "T", but I can see why you were mislead) and/or the align_view argument (see the stand alone blast documentation). > Another question I want to ask is in which method of blast_record I can get > the hsp.align-len, that is basically the length of the aligned hit? This value is > present in the xml file but in the class diagram available in Biopython tutorial > and cookbook I can't see a method that returns this value Probably you want hsp.align_length here. I'd have to double check that all three parsers are consistent here (for text, html and xml blast). > P.S.-I am having lots of problems with sending messages to biopython lists it > always say the message is awaiting approval because it contain a suspicious > header. Can anyone tell me what I need to do to change this behavior? I'm not sure. Are you sending emails from one email address via another account? Peter From winter at biotec.tu-dresden.de Wed Apr 16 14:05:50 2008 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 16 Apr 2008 16:05:50 +0200 Subject: [BioPython] How to produce html report with biopython blast In-Reply-To: <20080416133402.1fc59140@mail.biocant.pt> References: <20080416133402.1fc59140@mail.biocant.pt> Message-ID: <480607BE.4060507@biotec.tu-dresden.de> Bruno Santos wrote: [...] > Another question I want to ask is in which method of blast_record I can get > the hsp.align-len, that is basically the length of the aligned hit? This > value is present in the xml file but in the class diagram available in > Biopython tutorial and cookbook I can?t see a method that returns this > value. You probably want: alignment.length = total length of unaligned hit sequence len(hsp.query) = len(hsp.match) = len(hsp.sbjct) = length of alignment using from Bio.Blast import NCBIXML blast_records = NCBIXML.parse(open("my_blast.xml")) for blast_record in blast_records: for alignment in blast_record.alignments: for hsp in alignment.hsps: # do s.th. HTH, Christof > Thank you very much in advance, > > Bruno Santos > > > > P.S.-I am having lots of problems with sending messages to biopython lists > it always say the message is awaiting approval because it contain a > suspicious header. Can anyone tell me what I need to do to change this > behavior? _______________________________________________ BioPython mailing Sorry, no idea about that. Never happened to me. From mjldehoon at yahoo.com Wed Apr 16 15:17:43 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 16 Apr 2008 08:17:43 -0700 (PDT) Subject: [BioPython] Any RPS-BLAST users out there? In-Reply-To: <320fb6e00804160720i75b172caqdd83cc905a8bf937@mail.gmail.com> Message-ID: <823972.78108.qm@web62408.mail.re1.yahoo.com> Peter Cock wrote: > For example, the following page using RPS-BLAST could make a nice addition to > the "cook book" section of the main Biopython tutorial: > > http://www.warwick.ac.uk/go/peter_cock/python/rpsblast/ > > Does this look worth adding? Yes. However, I would prefer one Blast chapter in the tutorial, instead of a Blast chapter and also a cookbook section about Blast. --Michiel. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From mjldehoon at yahoo.com Wed Apr 16 15:19:55 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 16 Apr 2008 08:19:55 -0700 (PDT) Subject: [BioPython] How to produce html report with biopython blast In-Reply-To: <002101c89fce$7a6cd010$6f467030$@pt> Message-ID: <412965.415.qm@web62404.mail.re1.yahoo.com> Bruno Santos wrote:I also have almost finished writing some documentation for Bio.Sequencing module to work with ace files I will probably be able to finish it during this week if I have time. So then I need to know how I can upload the file so it be included in the BioPython documentation. Create a bug report on Bugzilla, and add your documentation there as a patch. Thanks! ---Michiel --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From timmcilveen at talktalk.net Sat Apr 19 21:57:19 2008 From: timmcilveen at talktalk.net (Tim) Date: Sat, 19 Apr 2008 22:57:19 +0100 Subject: [BioPython] parser code Message-ID: <1208642239.5978.7.camel@linux-73yn.site> Hi, When I was learning Bioperl, I used a book, Beginning Perl for Bioinformatics. It gave a lot of information on how to write your own parsers etc. Is there anything similar for python at the minute? It is great that biopython has lots of parsers available, but it would be great to see the code for these, with some explanations. Does this info exists? All the best, Tim From sdavis2 at mail.nih.gov Sun Apr 20 00:10:03 2008 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 19 Apr 2008 20:10:03 -0400 Subject: [BioPython] parser code In-Reply-To: <1208642239.5978.7.camel@linux-73yn.site> References: <1208642239.5978.7.camel@linux-73yn.site> Message-ID: <264855a00804191710k1de5535biffa0b192e89bd89f@mail.gmail.com> On Sat, Apr 19, 2008 at 5:57 PM, Tim wrote: > Hi, > When I was learning Bioperl, I used a book, Beginning Perl for > Bioinformatics. It gave a lot of information on how to write your own > parsers etc. Is there anything similar for python at the minute? It is > great that biopython has lots of parsers available, but it would be > great to see the code for these, with some explanations. Does this info > exists? Hello, Tim. Of course, the biopython project is open source, so you can just download the code and look at whatever you like. However, you might find this site useful: http://www.pasteur.fr/recherche/unites/sis/formation/python/ Sean From mjldehoon at yahoo.com Sun Apr 20 13:33:34 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 20 Apr 2008 06:33:34 -0700 (PDT) Subject: [BioPython] parser code In-Reply-To: <264855a00804191710k1de5535biffa0b192e89bd89f@mail.gmail.com> Message-ID: <438823.66033.qm@web62403.mail.re1.yahoo.com> I agree that that is an excellent site to learn Python and Biopython. One word of caution though: There may be some differences between the latest version of Biopython (1.45) and the Biopython that was used to write that tutorial. So when you try the examples shown there, you may see some deprecation warnings. --Michiel. Sean Davis wrote: On Sat, Apr 19, 2008 at 5:57 PM, Tim wrote: > Hi, > When I was learning Bioperl, I used a book, Beginning Perl for > Bioinformatics. It gave a lot of information on how to write your own > parsers etc. Is there anything similar for python at the minute? It is > great that biopython has lots of parsers available, but it would be > great to see the code for these, with some explanations. Does this info > exists? Hello, Tim. Of course, the biopython project is open source, so you can just download the code and look at whatever you like. However, you might find this site useful: http://www.pasteur.fr/recherche/unites/sis/formation/python/ Sean _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. From timmcilveen at talktalk.net Mon Apr 21 20:39:38 2008 From: timmcilveen at talktalk.net (Tim) Date: Mon, 21 Apr 2008 21:39:38 +0100 Subject: [BioPython] parser code Message-ID: <480CFB8A.3050100@talktalk.net> Hi, Thanks for the replies. I do have the Pasteur python course and have been working through it, but thanks for the suggestion on that one, anyhow. The genedrift.org site link is exactly what I was looking for, being a direct port of the perl code in Beginning Perl for Bioinformatics. Thanks again for the suggestions, Tim From nuin at genedrift.org Mon Apr 21 20:59:13 2008 From: nuin at genedrift.org (Paulo Nuin) Date: Mon, 21 Apr 2008 16:59:13 -0400 Subject: [BioPython] parser code In-Reply-To: <480CFB8A.3050100@talktalk.net> References: <480CFB8A.3050100@talktalk.net> Message-ID: <480D0021.7020708@genedrift.org> Hi Tim I noticed that my reply didn't go to the list too. Anyway, there are some parts of the book that are not covered in the site, but if you need them covered just let me know. Regards Paulo Tim wrote: > Hi, > Thanks for the replies. I do have the Pasteur python course and have > been working through it, but thanks for the suggestion on that one, > anyhow. The genedrift.org site link is exactly what I was looking for, > being a direct port of the perl code in Beginning Perl for > Bioinformatics. > > Thanks again for the suggestions, > Tim > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From macrozhu at gmail.com Wed Apr 23 10:22:19 2008 From: macrozhu at gmail.com (Macro Zhu) Date: Wed, 23 Apr 2008 12:22:19 +0200 Subject: [BioPython] parse element symbols for ATOM/HETATM records Message-ID: <11b97ec0804230322w669aac8bx37fb92cf4e84240c@mail.gmail.com> Hi, the current Bio.PDB.PDBParser does not parse column 77-78 from ATOM records in PDB files, where element symbols are (usually) stored for ATOM. We suggest BioPython to parse this information in the next version. The reasons are given as follows: 1. The current remediated PDB format requires these symbols to be always present ( http://www.wwpdb.org/documentation/format3.1-20080211.pdf ), though in old PDB files (v2.3), these symbols are sometimes missing. 2. In some cases it is not straightforward, if not impossible, to recognize hydrogen atoms by their identifiers in the remediated PDB files. e.g. in 1AWW, ATOM 378 HD11 LEU A 25 46.755 -3.858 0.453 1.00 0.00 H ATOM 379 HD12 LEU A 25 47.178 -2.160 0.234 1.00 0.00 H ATOM 380 HD13 LEU A 25 47.054 -3.226 -1.165 1.00 0.00 H ATOM 381 HD21 LEU A 25 49.453 -1.483 0.307 1.00 0.00 H ATOM 382 HD22 LEU A 25 50.714 -2.537 -0.327 1.00 0.00 H ATOM 383 HD23 LEU A 25 49.413 -1.984 -1.381 1.00 0.00 H In this PDB entry, chemical symbols (H) are not right justified in column 13-14 for hydrogen identifiers like for other elements. A bit extra work is required to figure it out. What's more, sometimes it's even impossible to distinguish hydrogen from mercury without columns 77-78. From the PDB entry format description version 2.1: "Hydrogen naming sometimes conflicts with IUPAC conventions. For example, a hydrogen named HG11 in columns 13 - 16 is differentiated from a mercury atom by the element symbol in columns 77 - 78. Columns 13 - 16 present a unique name for each atom." Therefore we strongly suggest PDBParser to cover column 77-78 for ATOM/HETATM records. We have looked at relevant code and it seems three files (Atom.py, PDBParser.py, StructureBuilder.py) needed to be revised marginally for integrating this update: 1). in Atom.py CVS Revision 1.18 line 17: add one parameter "element" to the function Atom::__init__(...) def __init__(self, name, coord, bfactor, occupancy, altloc, fullname, serial_number, element): line 61: add line self.element = element add a set method: def set_element(self, element): self.element = element add a public method: def get_element(self): return self.element 2). in PDBParser.py CVS Revision 1.20 line 161: add one line to parse element symbol in function PDBParser::_parse_coordinates(self, coords_trailer) element=line[76:78].strip() line 182: add one more parameter to init_atom(): structure_builder.init_atom(name, coord, bfactor, occupancy, altloc, fullname, serial_number, element) 3). in StructureBuilder.py CVS Revision 1.16 line 158: add one parameter "element" to the function StructureBuilder::init_atom(self, name, coord, b_factor, occupancy, altloc, fullname, serial_number=None, element='') line 190: add "element" to the initialization of Atom instance. atom=self.atom=myAtom(name, coord, b_factor, occupancy, altloc, fullname, serial_number, element) regards, -- Hongbo Zhu From biopython at maubp.freeserve.co.uk Wed Apr 23 08:42:21 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 23 Apr 2008 09:42:21 +0100 Subject: [BioPython] Taxonomy in BioSQL Message-ID: <320fb6e00804230142i5aebe31al223af8fc73c51dd2@mail.gmail.com> Eric and I have been discussing how best to deal with missing or partial taxonomy information when importing sequences into a BioSQL database. I'm forwarding his email which he meant to send to the list... See also Bug 2475, http://bugzilla.open-bio.org/show_bug.cgi?id=2475 Just for background, a typical GenBank file includes an NCBI taxon ID plus the lineage as a list of names (strings). Ideally, the user will have run the BioSQL script load_ncbi_taxonomy.pl before hand, and their taxonomy tables will be fully populated and included the new sequence's NCBI taxon ID. See: http://biopython.org/wiki/BioSQL#NCBI_Taxonomy The big question is what to do when the new sequence being adding to BioSQL doesn't have an NCBI taxon ID (e.g. its from a non-NCBI sequence file), and we are reduced to string matching of species names... Peter ---------- Forwarded message ---------- From: Eric Gibert Date: Wed, Apr 23, 2008 at 2:33 AM Subject: Taxonomy in BioPython Dear all, With the help of Peter and Michiel, I am looking into improving the taxonomy management in BioPython [with BioSQL]. The first step is completed: an XML parser for the information from the NCBI taxonomy database is posted in the CVS for Bio.Entrez The second step is to improve the current coding of BioSQL/Loader.py for the part in charge of gathering/saving taxonomic data. For this point, we have 2 questions: In the case the user does not wish to access the NCBI server to fetch taxonomic data (or the user's server cannot access Internet) but some lineage information is present in the records to load, do you prefer to: 1) load ONLY the species in the BioSQL.taxon table (if INSERT is needed) - safe: main data are known - or 2) load the lineage information available, which most of the time means name and rank only (i.e. no NCBI id) - risky: not all data are known, different lineage might have different levels and looking by scientific name might create duplicates - Another related question: is there anyone accessing a taxonomic database different from the NCBI one? The persons using Loader.py or interested in doing so are welcome to comment and choose. Best regards, Eric __________________________________________________ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicit?s http://mail.yahoo.fr Yahoo! Mail From lueck at ipk-gatersleben.de Wed Apr 23 12:49:50 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Wed, 23 Apr 2008 14:49:50 +0200 Subject: [BioPython] blastn parameters Message-ID: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> Hello! Does anyone knows whether the parameters of NCBI's online blastn (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on) are the same, when I use blastall for a local blastn (without to change anything)? On the homepage I couldn't find the parameters. Thanks for help! Stefanie From sbassi at gmail.com Wed Apr 23 20:40:58 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 23 Apr 2008 17:40:58 -0300 Subject: [BioPython] blastn parameters In-Reply-To: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> Message-ID: On Wed, Apr 23, 2008 at 9:49 AM, Stefanie L?ck wrote: > Hello! > Does anyone knows whether the parameters of NCBI's online blastn (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on) > are the same, when I use blastall for a local blastn (without to change anything)? > > On the homepage I couldn't find the parameters. Just run the BLAST and the see the parameters at the end of the result page. But, now I see there is a place to see parameters, at the end of the page, look at this screenshot: http://bayimg.com/FaJnAAABH -- Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From cjfields at uiuc.edu Wed Apr 23 21:07:44 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 23 Apr 2008 16:07:44 -0500 Subject: [BioPython] blastn parameters In-Reply-To: References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> Message-ID: On Apr 23, 2008, at 3:40 PM, Sebastian Bassi wrote: > On Wed, Apr 23, 2008 at 9:49 AM, Stefanie L?ck > wrote: >> Hello! >> Does anyone knows whether the parameters of NCBI's online blastn (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on >> ) >> are the same, when I use blastall for a local blastn (without to >> change anything)? >> >> On the homepage I couldn't find the parameters. > > Just run the BLAST and the see the parameters at the end of the > result page. > But, now I see there is a place to see parameters, at the end of the > page, look at this screenshot: > http://bayimg.com/FaJnAAABH For remote BLAST one should look at the URLAPI parameters instead of referring to 'blastall' docs; there are some similarities but I think a few parameter defaults are different, just can't remember which ones (maybe eval cutoff, default result format display...) Here's the latest: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/ chris From bsouthey at gmail.com Wed Apr 23 21:31:34 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 23 Apr 2008 16:31:34 -0500 Subject: [BioPython] blastn parameters In-Reply-To: References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> Message-ID: <480FAAB6.9040604@gmail.com> Sebastian Bassi wrote: > On Wed, Apr 23, 2008 at 9:49 AM, Stefanie L?ck wrote: > >> Hello! >> Does anyone knows whether the parameters of NCBI's online blastn (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on) >> are the same, when I use blastall for a local blastn (without to change anything)? >> >> On the homepage I couldn't find the parameters. >> > > Just run the BLAST and the see the parameters at the end of the result page. > But, now I see there is a place to see parameters, at the end of the > page, look at this screenshot: > http://bayimg.com/FaJnAAABH > > > Hi, Well, it really depends... The NCBI site does track your preferences so if you modify an option it will use that value in the future. It may also depend on which blast version. From the NCBI FAQ, "match reward (-r) and mismatch penalty (-q) settings are now 2, -3 (from 1, -3)". This is different than blastall version 2.2.17 that I have (could be changed in 2.2.18). Word size appears to be another change. Bruce From lueck at ipk-gatersleben.de Thu Apr 24 12:03:58 2008 From: lueck at ipk-gatersleben.de (=?iso-8859-1?Q?Stefanie_L=FCck?=) Date: Thu, 24 Apr 2008 14:03:58 +0200 Subject: [BioPython] blastn parameters References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> <480FAAB6.9040604@gmail.com> Message-ID: <007001c8a603$471b18d0$1022a8c0@ipkgatersleben.de> Thanks for all the reply! Could someone tell me also how I can change the paramters over Biopyton? I this possible? Stefanie ----- Original Message ----- From: "Bruce Southey" To: "Sebastian Bassi" Cc: "Biopython List" Sent: Wednesday, April 23, 2008 11:31 PM Subject: Re: [BioPython] blastn parameters Sebastian Bassi wrote: > On Wed, Apr 23, 2008 at 9:49 AM, Stefanie L?ck > wrote: > >> Hello! >> Does anyone knows whether the parameters of NCBI's online blastn >> (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&MEGABLAST=on&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on) >> are the same, when I use blastall for a local blastn (without to change >> anything)? >> >> On the homepage I couldn't find the parameters. >> > > Just run the BLAST and the see the parameters at the end of the result > page. > But, now I see there is a place to see parameters, at the end of the > page, look at this screenshot: > http://bayimg.com/FaJnAAABH > > > Hi, Well, it really depends... The NCBI site does track your preferences so if you modify an option it will use that value in the future. It may also depend on which blast version. From the NCBI FAQ, "match reward (-r) and mismatch penalty (-q) settings are now 2, -3 (from 1, -3)". This is different than blastall version 2.2.17 that I have (could be changed in 2.2.18). Word size appears to be another change. Bruce _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Thu Apr 24 12:22:56 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 24 Apr 2008 13:22:56 +0100 Subject: [BioPython] blastn parameters In-Reply-To: <007001c8a603$471b18d0$1022a8c0@ipkgatersleben.de> References: <004c01c8a540$84e10ad0$1022a8c0@ipkgatersleben.de> <480FAAB6.9040604@gmail.com> <007001c8a603$471b18d0$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00804240522i2c51f765s27c56049ff713878@mail.gmail.com> Stefanie L?ck wrote: > Thanks for all the reply! > Could someone tell me also how I can change the paramters over Biopyton? I > this possible? > > Stefanie You should be able to invoke any of the Blast options via Biopython. from Bio.Blast.NCBIStandalone import blastall help(blastall) from Bio.Blast.NCBIWWW import qblast help(qblast) Peter From mjldehoon at yahoo.com Sat Apr 26 00:53:17 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 25 Apr 2008 17:53:17 -0700 (PDT) Subject: [BioPython] parse element symbols for ATOM/HETATM records In-Reply-To: <11b97ec0804230322w669aac8bx37fb92cf4e84240c@mail.gmail.com> Message-ID: <763020.40003.qm@web62407.mail.re1.yahoo.com> Dear Hongbo Zhu, Could you open a bug report on BugZilla and append your patch there? Patches sent to the mailing list tend to get lost. --Michiel. Macro Zhu wrote: Hi, the current Bio.PDB.PDBParser does not parse column 77-78 from ATOM records in PDB files, where element symbols are (usually) stored for ATOM. We suggest BioPython to parse this information in the next version. The reasons are given as follows: 1. The current remediated PDB format requires these symbols to be always present ( http://www.wwpdb.org/documentation/format3.1-20080211.pdf ), though in old PDB files (v2.3), these symbols are sometimes missing. 2. In some cases it is not straightforward, if not impossible, to recognize hydrogen atoms by their identifiers in the remediated PDB files. e.g. in 1AWW, ATOM 378 HD11 LEU A 25 46.755 -3.858 0.453 1.00 0.00 H ATOM 379 HD12 LEU A 25 47.178 -2.160 0.234 1.00 0.00 H ATOM 380 HD13 LEU A 25 47.054 -3.226 -1.165 1.00 0.00 H ATOM 381 HD21 LEU A 25 49.453 -1.483 0.307 1.00 0.00 H ATOM 382 HD22 LEU A 25 50.714 -2.537 -0.327 1.00 0.00 H ATOM 383 HD23 LEU A 25 49.413 -1.984 -1.381 1.00 0.00 H In this PDB entry, chemical symbols (H) are not right justified in column 13-14 for hydrogen identifiers like for other elements. A bit extra work is required to figure it out. What's more, sometimes it's even impossible to distinguish hydrogen from mercury without columns 77-78. From the PDB entry format description version 2.1: "Hydrogen naming sometimes conflicts with IUPAC conventions. For example, a hydrogen named HG11 in columns 13 - 16 is differentiated from a mercury atom by the element symbol in columns 77 - 78. Columns 13 - 16 present a unique name for each atom." Therefore we strongly suggest PDBParser to cover column 77-78 for ATOM/HETATM records. We have looked at relevant code and it seems three files (Atom.py, PDBParser.py, StructureBuilder.py) needed to be revised marginally for integrating this update: 1). in Atom.py CVS Revision 1.18 line 17: add one parameter "element" to the function Atom::__init__(...) def __init__(self, name, coord, bfactor, occupancy, altloc, fullname, serial_number, element): line 61: add line self.element = element add a set method: def set_element(self, element): self.element = element add a public method: def get_element(self): return self.element 2). in PDBParser.py CVS Revision 1.20 line 161: add one line to parse element symbol in function PDBParser::_parse_coordinates(self, coords_trailer) element=line[76:78].strip() line 182: add one more parameter to init_atom(): structure_builder.init_atom(name, coord, bfactor, occupancy, altloc, fullname, serial_number, element) 3). in StructureBuilder.py CVS Revision 1.16 line 158: add one parameter "element" to the function StructureBuilder::init_atom(self, name, coord, b_factor, occupancy, altloc, fullname, serial_number=None, element='') line 190: add "element" to the initialization of Atom instance. atom=self.atom=myAtom(name, coord, b_factor, occupancy, altloc, fullname, serial_number, element) regards, -- Hongbo Zhu _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.