From mdehoon at c2b2.columbia.edu Thu Jun 1 20:57:35 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 01 Jun 2006 17:57:35 -0700 Subject: [BioPython] NCBIWWW.qblast with refseq by organism In-Reply-To: <20060526165711.94194.qmail@web51708.mail.yahoo.com> References: <20060526165711.94194.qmail@web51708.mail.yahoo.com> Message-ID: <447F8CFF.9050204@c2b2.columbia.edu> Denil Wickrama wrote: > Hi, I would like to BLAST a list of proteins against the refseq > database and retrieve the corresponding accession numbers of the > exact hits. I get errors when I change from the nr database to the > refseq database. Also I am trying to restrict the results by organism > name, but that was not successful. > result_handle = NCBIWWW.qblast("blastp", "nr", seq, entrez_query='"rattus norvegicus" > [Organism]') > result_handle = NCBIWWW.qblast("blastp", "refseq", seq, entrez_query='"rattus norvegicus" [Organism]') > Is it possible to do refseq searches with NCBIWWW.qblast? It turns out that the NCBI server actually wants "refseq_protein" instead of "refseq". (You can check this by saving NCBI's Protein-protein blast page in HTML, and looking at the source). So if you replace "refseq" by "refseq_protein", your code should run. Restricting the results by organism worked fine for me with the entrez_query you have. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Jun 1 21:12:57 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 01 Jun 2006 18:12:57 -0700 Subject: [BioPython] NCBIWWW.qblast In-Reply-To: <20060531114048.83077.qmail@web36813.mail.mud.yahoo.com> References: <20060531114048.83077.qmail@web36813.mail.mud.yahoo.com> Message-ID: <447F9099.1040800@c2b2.columbia.edu> Try this instead: from Bio import Fasta file_for_blast = open('fasta', 'r') f_iterator = Fasta.Iterator(file_for_blast) from Bio.Blast import NCBIWWW seqnum = 0 for f_record in f_iterator: result_handle = NCBIWWW.qblast('blastp', 'nr', f_record) save_file = open('my_blast'+str(seqnum)+'.out', 'w') blast_results = result_handle.read() save_file.write(blast_results) save_file.close() seqnum += 1 --Michiel. alper soyler wrote: > Dear All, > > I have a fasta file (called fasta) containing 20 proteins. I want to blast them in an order. How can I write the results of these 20 proteins in different output files. I tried to write the below script but the 'my_blast2.out' file turned empty. Can you help me please? > > regards, > Alper > > #!usr/local/bin/python > > from Bio import Fasta > file_for_blast = open('fasta', 'r') > f_iterator = Fasta.Iterator(file_for_blast) > f_record = f_iterator.next() > > from Bio.Blast import NCBIWWW > result_handle = NCBIWWW.qblast('blastp', 'nr', f_record) > > seqnum = 0 > > for f_record in f_iterator: > save_file = open('my_blast.out', 'w') > blast_results = result_handle.read() > save_file.write(blast_results) > save_file.close() > seqnum += 1 > save_file2 = open('my_blast2.out', 'w') > blast_results = result_handle.read() > save_file2.write(blast_results) > save_file2.close() > > --------------------------------- > Be a chatter box. Enjoy free PC-to-PC calls with Yahoo! Messenger with Voice. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From omid9dr18 at hotmail.com Thu Jun 1 18:39:34 2006 From: omid9dr18 at hotmail.com (Omid Khalouei) Date: Thu, 1 Jun 2006 22:39:34 +0000 Subject: [BioPython] Synthesized or Clinical PDB sequence Message-ID: Hello, Is there any way to find out if a sequence corresponding to a PDB structure was obtained clinically or was synthesized without having to read the primary citations? Thanks for your help. Omid K. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ From boris.steipe at utoronto.ca Thu Jun 1 22:25:48 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu, 1 Jun 2006 22:25:48 -0400 Subject: [BioPython] Synthesized or Clinical PDB sequence In-Reply-To: References: Message-ID: Since the PDB does not use a constrained vocabulary, this is a bit unreliable. But the information is supposed to be entered in the SOURCE record. cf.: http://www.rcsb.org/pdb/file_formats/pdb/pdbguide2.2/part_20.html HTH, Boris On 1 Jun 2006, at 18:39, Omid Khalouei wrote: > Hello, > > Is there any way to find out if a sequence corresponding to a PDB > structure was obtained clinically or was synthesized without having > to read the primary citations? > > Thanks for your help. > Omid K. > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today it's > FREE! > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From lee.byung-chul at kaist.ac.kr Fri Jun 2 05:45:09 2006 From: lee.byung-chul at kaist.ac.kr (Lee, Byung-chul) Date: Fri, 02 Jun 2006 18:45:09 +0900 Subject: [BioPython] Drawing Ramanchandran plot Message-ID: <448008A5.8090602@kaist.ac.kr> Hi all, During calculating the torsion angles of some atoms in PDB files, I want to draw the Ramanchandran plot of those. However, I cannot find any modules or methods of doing that in Bio.PDB, so if anyone knows where it is os how to make it, please inform me. Thanks, Byung-chul. -- -------------------------------------------------------- The important thing is not to stop questioning. : Albert Einstein Byung chul Lee a member of Protein BioInformatics Lab. (PBIL) at Detp. BioSystems KAIST, Korea Ph.D candidate 82-42-869-4357 -------------------------------------------------------- From biopython at maubp.freeserve.co.uk Fri Jun 2 08:15:25 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 02 Jun 2006 13:15:25 +0100 Subject: [BioPython] Drawing Ramanchandran plot In-Reply-To: <448008A5.8090602@kaist.ac.kr> References: <448008A5.8090602@kaist.ac.kr> Message-ID: <44802BDD.6080703@maubp.freeserve.co.uk> Lee, Byung-chul wrote: > Hi all, > > During calculating the torsion angles of some atoms in PDB files, I want > to draw the Ramanchandran plot of those. > However, I cannot find any modules or methods of doing that in Bio.PDB, > so if anyone knows where it is os how to make it, please inform me. > > Thanks, > Byung-chul. > A work in progress: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/ Short summary about calculating the angles: * MMTK is great, providing it can load the PDB file. Very very easy to get the angles * BioPython's Bio.PDB will load most/al PDB files, but you have to work out the backbone and angles yourself. * Python Macromolecular Library (mmLib) might also be worth looking at. Once you have the angles, you will want to draw the plots - the link above suggests a package like Excel, R, or Peter Robinson's Java Program: http://www.charite.de/ch/medgen/compgen/ramachandran/ Peter From sbassi at gmail.com Wed Jun 7 15:25:44 2006 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 7 Jun 2006 16:25:44 -0300 Subject: [BioPython] From REF to sequence? Message-ID: Hello, I have a list like this: >ref|NP_918285.1| >dbj|BAD88119.1| >dbj|BAD88118.1| >ref|XP_475495.1| >emb|CAD37200.1| >gb|AAM64572.1| (the list is much bigger, but with this sample you could get the idea). I would like to create an URL from each entry to retrieve the full NCBI information about these sequence. Is there a Biopython method for doing this? I read once about a NCBI syntaxis to build URLs, but I can't find it. Best regards, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From chris.lasher at gmail.com Thu Jun 8 17:32:26 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Thu, 8 Jun 2006 17:32:26 -0400 Subject: [BioPython] Distance Matrix Parsers Message-ID: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> Hi all, Are there any modules in BioPython to parse distance matrices? My poking around the BioPython modules and Google searching does not turn up any signs indicating there are distance matrix parsers, currently. Two particularly useful parsers would be a parser for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP (http://evolution.genetics.washington.edu/phylip.html), and a parser for the MEGA (http://www.megasoftware.net/mega.html) distance matrix format. If not, would there be any interest in creating parsers for these matrices, other than my own? I think parsers for distance matrices could be very useful to the community. Chris From mcolosimo at mitre.org Fri Jun 9 08:16:02 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Fri, 9 Jun 2006 08:16:02 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> Message-ID: <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> Hi Chris, I don't think there is a parser for those. I have in the past thought about writing them up. I was looking over the structure of BioPython to see where it would best fit [I'll save my rant on this for another time, maybe later today]. In the mean time, the folks at BioPerl have Bio-Phylo CPAN module , which looks nice, but it does NOT have what you are looking for. However, I am planning on following that. Marc On Jun 8, 2006, at 5:32 PM, Chris Lasher wrote: > Hi all, > Are there any modules in BioPython to parse distance matrices? My > poking around the BioPython modules and Google searching does not turn > up any signs indicating there are distance matrix parsers, currently. > Two particularly useful parsers would be a parser for the output of > DNADIST/PROTDIST/RESTDIST from PHYLIP > (http://evolution.genetics.washington.edu/phylip.html), and a parser > for the MEGA (http://www.megasoftware.net/mega.html) distance matrix > format. If not, would there be any interest in creating parsers for > these matrices, other than my own? I think parsers for distance > matrices could be very useful to the community. > > Chris > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From chris.lasher at gmail.com Fri Jun 9 11:59:56 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 9 Jun 2006 11:59:56 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> Message-ID: <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> Hi Marc, Thanks for the reply. I had not seen the Bio::Phylo package before. Thanks for pointing that out. That seems to have be a really useful library, though it's not exactly what I was thinking about when I originally posted. I was thinking more along the lines of the Bio::Matrix modules (http://bio.perl.org/wiki/Special:Search?search=matrix&go=Go). I don't think writing parsers for these formats will be that difficult. I am unsure, however, about what type of data structure the matrix should be. The simplest solution is a nested list. Perhaps this is the proper solution, as the user can then convert this over to a NumPy multi-dimensional array, say, or some matrix object. I dunno. Thoughts, comments, suggestions? Chris On 6/9/06, Marc Colosimo wrote: > Hi Chris, > > I don't think there is a parser for those. I have in the past thought > about writing them up. I was looking over the structure of BioPython > to see where it would best fit [I'll save my rant on this for another > time, maybe later today]. In the mean time, the folks at BioPerl have > Bio-Phylo CPAN module , > which looks nice, but it does NOT have what you are looking for. > However, I am planning on following that. > > Marc > > On Jun 8, 2006, at 5:32 PM, Chris Lasher wrote: > > > Hi all, > > Are there any modules in BioPython to parse distance matrices? My > > poking around the BioPython modules and Google searching does not turn > > up any signs indicating there are distance matrix parsers, currently. > > Two particularly useful parsers would be a parser for the output of > > DNADIST/PROTDIST/RESTDIST from PHYLIP > > (http://evolution.genetics.washington.edu/phylip.html), and a parser > > for the MEGA (http://www.megasoftware.net/mega.html) distance matrix > > format. If not, would there be any interest in creating parsers for > > these matrices, other than my own? I think parsers for distance > > matrices could be very useful to the community. > > > > Chris > > _______________________________________________ > > BioPython mailing list - BioPython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > From mcolosimo at mitre.org Fri Jun 9 14:41:29 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Fri, 9 Jun 2006 14:41:29 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> Message-ID: <8AC5BAA2-BA47-4772-88C7-DF4B2061A8E2@mitre.org> Chris, I likewise didn't know about the Bio::Matrix::PhylipDist module. Personally, I would opt for a Matrix Object (since this is Python a OO language) and store it internally as a nested list. That way you have the best of both worlds. The next question is the object hierarchy. Here I would opt for a top level Matrix class (or module) and then subclass that under Phylo. So, something like this: Bio.Matrix Bio.Phylo.Matrix and maybe things like the following (which isn't used/followed much here in BioPython) Bio.Phylo.IO Bio.Phylo.Parsers.PhylipDist Bio.Phylo.Parsers.Newick Bio.Phylo.Parsers.Nexus And/or have Bio.Phylo.Matrix.IO that uses the PhylipDist parser. The next big question is what should Bio.Phylo.IO return? For inspiration, we might want to look at Mesquite . Marc On Jun 9, 2006, at 11:59 AM, Chris Lasher wrote: > Hi Marc, > > Thanks for the reply. I had not seen the Bio::Phylo package before. > Thanks for pointing that out. That seems to have be a really useful > library, though it's not exactly what I was thinking about when I > originally posted. I was thinking more along the lines of the > Bio::Matrix modules > (http://bio.perl.org/wiki/Special:Search?search=matrix&go=Go). > > I don't think writing parsers for these formats will be that > difficult. I am unsure, however, about what type of data structure the > matrix should be. The simplest solution is a nested list. Perhaps this > is the proper solution, as the user can then convert this over to a > NumPy multi-dimensional array, say, or some matrix object. I dunno. > Thoughts, comments, suggestions? > > Chris > > On 6/9/06, Marc Colosimo wrote: >> Hi Chris, >> >> I don't think there is a parser for those. I have in the past thought >> about writing them up. I was looking over the structure of BioPython >> to see where it would best fit [I'll save my rant on this for another >> time, maybe later today]. In the mean time, the folks at BioPerl have >> Bio-Phylo CPAN module , >> which looks nice, but it does NOT have what you are looking for. >> However, I am planning on following that. >> >> Marc >> >> On Jun 8, 2006, at 5:32 PM, Chris Lasher wrote: >> >>> Hi all, >>> Are there any modules in BioPython to parse distance matrices? My >>> poking around the BioPython modules and Google searching does not >>> turn >>> up any signs indicating there are distance matrix parsers, >>> currently. >>> Two particularly useful parsers would be a parser for the output of >>> DNADIST/PROTDIST/RESTDIST from PHYLIP >>> (http://evolution.genetics.washington.edu/phylip.html), and a parser >>> for the MEGA (http://www.megasoftware.net/mega.html) distance matrix >>> format. If not, would there be any interest in creating parsers for >>> these matrices, other than my own? I think parsers for distance >>> matrices could be very useful to the community. >>> >>> Chris >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> >> > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From chris.lasher at gmail.com Fri Jun 9 17:13:32 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 9 Jun 2006 17:13:32 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <8AC5BAA2-BA47-4772-88C7-DF4B2061A8E2@mitre.org> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> <8AC5BAA2-BA47-4772-88C7-DF4B2061A8E2@mitre.org> Message-ID: <128a885f0606091413o23088caesf4934a81f0cc0489@mail.gmail.com> > I likewise didn't know about the Bio::Matrix::PhylipDist module. > Personally, I would opt for a Matrix Object (since this is Python a > OO language) and store it internally as a nested list. That way you > have the best of both worlds. The next question is the object > hierarchy. Here I would opt for a top level Matrix class (or module) > and then subclass that under Phylo. So, something like this: > > Bio.Matrix > Bio.Phylo.Matrix So is this more appropriate than Bio.Matrix.Phylo? A phylogenetic matrix is a type of matrix, so that hierarchy is immediately appealing, however, a phylogenetic matrix is not of much use in and of itself, so I can see the argument that it should be placed in a phylogeny package (which we have yet to write but as mentioned earlier, could be very useful). > and maybe things like the following (which isn't used/followed much > here in BioPython) > > Bio.Phylo.IO > Bio.Phylo.Parsers.PhylipDist > Bio.Phylo.Parsers.Newick > Bio.Phylo.Parsers.Nexus > > And/or have > Bio.Phylo.Matrix.IO that uses the PhylipDist parser. This is very very good, in my opinion. Thanks for doing the heavy-lifting of the brainwork on this! =-) > The next big question is what should Bio.Phylo.IO return? For > inspiration, we might want to look at Mesquite mesquiteproject.org/mesquite/mesquite.html>. I must give a better look at this site before commenting, but once again, thanks for bringing this to my awareness! What a helpful past couple of emails. I will be out for the weekend but will think more about this. As a sidenote, should this discussion be moved to biopython-dev or is it fine here? Thanks again Marc, Chris From biopython at maubp.freeserve.co.uk Sat Jun 10 06:10:02 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jun 2006 11:10:02 +0100 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> Message-ID: <448A9A7A.6050501@maubp.freeserve.co.uk> Chris Lasher wrote: > Hi all, Are there any modules in BioPython to parse distance > matrices? My poking around the BioPython modules and Google searching > does not turn up any signs indicating there are distance matrix > parsers, currently. Two particularly useful parsers would be a parser > for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP > (http://evolution.genetics.washington.edu/phylip.html), I've done a very small amount of work with neighbour joining trees, using PHYLIP format distance matrices. The closest I could find to a file format definition was this page: http://evolution.genetics.washington.edu/phylip/doc/distance.html Points to be aware of: In my experience, most software tools usually write the distances as a full symmetric matrix. However, the "standard" explicitly discusses lower triangular form (missing out the diagonal distance zero entries) which has the significant advantage of using about half the disk space. This is significant once you get into thousands of taxa. So, make sure any parser can cope with both full symmetric, and lower triangular forms - ideally without the user having to care. This also raises the point about how to store the matrix in memory. Does Numeric/NumPy have an efficient way of storing symmetric matrices? This is less flexible than the suggested list of lists, but for large datasets would need much less memory. Second point - the "official" PHYLIP distance matrix file format truncates the taxa names at 10 characters. Some tools (e.g. clustalw) ignore this limitation and will use as many as needed for the full name. I personally find this much nicer - after all most gene identifiers (e.g. GI numbers) are eight characters to start with, and if you are dealing with multiple features in each gene 10 characters is tough going. So, I would make sure you test the parser on this format variant (with names longer than 10 characters). I can supply some examples if you like. For writing matrices to file, the issue of following the strict 10 character taxa limit might best be handled as an option (default to max 10, with a warning if any names are truncated, and an error if truncation renders names non-unique?). Likewise an option to save matrices as either fully symmetric or lower triangular. I would lean towards using fully symmetric as the default as it seems to be more common. > and a parser for the MEGA (http://www.megasoftware.net/mega.html) > distance matrix format. If not, would there be any interest in > creating parsers for these matrices, other than my own? I think > parsers for distance matrices could be very useful to the community. I suspect that for serious tree building pure python will not be competitive with existing C/C++ code on speed - but non-the-less could be useful. Peter From idoerg at burnham.org Sat Jun 10 11:08:43 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat, 10 Jun 2006 08:08:43 -0700 Subject: [BioPython] Distance Matrix Parsers References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: <1F97379A556D0946AAEFE3F63FD6F5744D468D@MAIL.burnham.org> Hi, Bio.SubsMat has a parser for substitution matrices, lower triangular and square. Feel free to recycle code. Best, Iddo -- Iddo Friedberg, PhD Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA T: +1 858 646 3100 x3516 http://iddo-friedberg.org http://BioFunctionPrediction.org -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Peter Sent: Sat 6/10/2006 3:10 AM To: BioPython Mailing List Subject: Re: [BioPython] Distance Matrix Parsers Chris Lasher wrote: > Hi all, Are there any modules in BioPython to parse distance > matrices? My poking around the BioPython modules and Google searching > does not turn up any signs indicating there are distance matrix > parsers, currently. Two particularly useful parsers would be a parser > for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP > (http://evolution.genetics.washington.edu/phylip.html), I've done a very small amount of work with neighbour joining trees, using PHYLIP format distance matrices. The closest I could find to a file format definition was this page: http://evolution.genetics.washington.edu/phylip/doc/distance.html Points to be aware of: In my experience, most software tools usually write the distances as a full symmetric matrix. However, the "standard" explicitly discusses lower triangular form (missing out the diagonal distance zero entries) which has the significant advantage of using about half the disk space. This is significant once you get into thousands of taxa. So, make sure any parser can cope with both full symmetric, and lower triangular forms - ideally without the user having to care. This also raises the point about how to store the matrix in memory. Does Numeric/NumPy have an efficient way of storing symmetric matrices? This is less flexible than the suggested list of lists, but for large datasets would need much less memory. Second point - the "official" PHYLIP distance matrix file format truncates the taxa names at 10 characters. Some tools (e.g. clustalw) ignore this limitation and will use as many as needed for the full name. I personally find this much nicer - after all most gene identifiers (e.g. GI numbers) are eight characters to start with, and if you are dealing with multiple features in each gene 10 characters is tough going. So, I would make sure you test the parser on this format variant (with names longer than 10 characters). I can supply some examples if you like. For writing matrices to file, the issue of following the strict 10 character taxa limit might best be handled as an option (default to max 10, with a warning if any names are truncated, and an error if truncation renders names non-unique?). Likewise an option to save matrices as either fully symmetric or lower triangular. I would lean towards using fully symmetric as the default as it seems to be more common. > and a parser for the MEGA (http://www.megasoftware.net/mega.html) > distance matrix format. If not, would there be any interest in > creating parsers for these matrices, other than my own? I think > parsers for distance matrices could be very useful to the community. I suspect that for serious tree building pure python will not be competitive with existing C/C++ code on speed - but non-the-less could be useful. Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From idoerg at burnham.org Sat Jun 10 11:08:43 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat, 10 Jun 2006 08:08:43 -0700 Subject: [BioPython] Distance Matrix Parsers References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: <1F97379A556D0946AAEFE3F63FD6F5744D468D@MAIL.burnham.org> Hi, Bio.SubsMat has a parser for substitution matrices, lower triangular and square. Feel free to recycle code. Best, Iddo -- Iddo Friedberg, PhD Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA T: +1 858 646 3100 x3516 http://iddo-friedberg.org http://BioFunctionPrediction.org -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Peter Sent: Sat 6/10/2006 3:10 AM To: BioPython Mailing List Subject: Re: [BioPython] Distance Matrix Parsers Chris Lasher wrote: > Hi all, Are there any modules in BioPython to parse distance > matrices? My poking around the BioPython modules and Google searching > does not turn up any signs indicating there are distance matrix > parsers, currently. Two particularly useful parsers would be a parser > for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP > (http://evolution.genetics.washington.edu/phylip.html), I've done a very small amount of work with neighbour joining trees, using PHYLIP format distance matrices. The closest I could find to a file format definition was this page: http://evolution.genetics.washington.edu/phylip/doc/distance.html Points to be aware of: In my experience, most software tools usually write the distances as a full symmetric matrix. However, the "standard" explicitly discusses lower triangular form (missing out the diagonal distance zero entries) which has the significant advantage of using about half the disk space. This is significant once you get into thousands of taxa. So, make sure any parser can cope with both full symmetric, and lower triangular forms - ideally without the user having to care. This also raises the point about how to store the matrix in memory. Does Numeric/NumPy have an efficient way of storing symmetric matrices? This is less flexible than the suggested list of lists, but for large datasets would need much less memory. Second point - the "official" PHYLIP distance matrix file format truncates the taxa names at 10 characters. Some tools (e.g. clustalw) ignore this limitation and will use as many as needed for the full name. I personally find this much nicer - after all most gene identifiers (e.g. GI numbers) are eight characters to start with, and if you are dealing with multiple features in each gene 10 characters is tough going. So, I would make sure you test the parser on this format variant (with names longer than 10 characters). I can supply some examples if you like. For writing matrices to file, the issue of following the strict 10 character taxa limit might best be handled as an option (default to max 10, with a warning if any names are truncated, and an error if truncation renders names non-unique?). Likewise an option to save matrices as either fully symmetric or lower triangular. I would lean towards using fully symmetric as the default as it seems to be more common. > and a parser for the MEGA (http://www.megasoftware.net/mega.html) > distance matrix format. If not, would there be any interest in > creating parsers for these matrices, other than my own? I think > parsers for distance matrices could be very useful to the community. I suspect that for serious tree building pure python will not be competitive with existing C/C++ code on speed - but non-the-less could be useful. Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 4656 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20060610/5b8aa9fa/attachment-0001.bin From mcolosimo at mitre.org Mon Jun 12 08:38:18 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Mon, 12 Jun 2006 08:38:18 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606091413o23088caesf4934a81f0cc0489@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> <8AC5BAA2-BA47-4772-88C7-DF4B2061A8E2@mitre.org> <128a885f0606091413o23088caesf4934a81f0cc0489@mail.gmail.com> Message-ID: <65DF4A7E-B365-4E61-93D4-156A36F6ED54@mitre.org> [cross-posting to biopython-dev] Chris, Oops, didn't notice this was on the general biopython mailing list. I think many of the developers also subscribe to this list, but just in case I'm cross posting this. Iddo pointed out the Bio.SubsMat, which I didn't know what that module did. One problem with names like that, but the API Docs are helpful only when you look at them (Kuddos for those who add documentation). Given Bio.SubsMat and the BioPerl Module, I would strongly consider combining the Bio.SubsMat and the PhylipDist into a new Bio.Matrix module. From a Phylo module, a function/class can always call the Bio.Matrix classes. Marc On Jun 9, 2006, at 5:13 PM, Chris Lasher wrote: >> I likewise didn't know about the Bio::Matrix::PhylipDist module. >> Personally, I would opt for a Matrix Object (since this is Python a >> OO language) and store it internally as a nested list. That way you >> have the best of both worlds. The next question is the object >> hierarchy. Here I would opt for a top level Matrix class (or module) >> and then subclass that under Phylo. So, something like this: >> >> Bio.Matrix >> Bio.Phylo.Matrix > > So is this more appropriate than Bio.Matrix.Phylo? A phylogenetic > matrix is a type of matrix, so that hierarchy is immediately > appealing, however, a phylogenetic matrix is not of much use in and of > itself, so I can see the argument that it should be placed in a > phylogeny package (which we have yet to write but as mentioned > earlier, could be very useful). > >> and maybe things like the following (which isn't used/followed much >> here in BioPython) >> >> Bio.Phylo.IO >> Bio.Phylo.Parsers.PhylipDist >> Bio.Phylo.Parsers.Newick >> Bio.Phylo.Parsers.Nexus >> >> And/or have >> Bio.Phylo.Matrix.IO that uses the PhylipDist parser. > > This is very very good, in my opinion. Thanks for doing the > heavy-lifting of the brainwork on this! =-) > >> The next big question is what should Bio.Phylo.IO return? For >> inspiration, we might want to look at Mesquite > mesquiteproject.org/mesquite/mesquite.html>. > > I must give a better look at this site before commenting, but once > again, thanks for bringing this to my awareness! What a helpful past > couple of emails. I will be out for the weekend but will think more > about this. > > As a sidenote, should this discussion be moved to biopython-dev or is > it fine here? > > Thanks again Marc, > Chris > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From mcolosimo at mitre.org Mon Jun 12 09:18:41 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Mon, 12 Jun 2006 09:18:41 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <448A9A7A.6050501@maubp.freeserve.co.uk> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: [cross post] On Jun 10, 2006, at 6:10 AM, Peter wrote: > Chris Lasher wrote: >> Hi all, Are there any modules in BioPython to parse distance >> matrices? My poking around the BioPython modules and Google searching >> does not turn up any signs indicating there are distance matrix >> parsers, currently. Two particularly useful parsers would be a parser >> for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP >> (http://evolution.genetics.washington.edu/phylip.html), > > I've done a very small amount of work with neighbour joining trees, > using PHYLIP format distance matrices. The closest I could find to a > file format definition was this page: > > http://evolution.genetics.washington.edu/phylip/doc/distance.html > > Points to be aware of: > > In my experience, most software tools usually write the distances as a > full symmetric matrix. However, the "standard" explicitly discusses > lower triangular form (missing out the diagonal distance zero entries) > which has the significant advantage of using about half the disk > space. > This is significant once you get into thousands of taxa. This is still small potatoes compared to the input needed to generate the distance matrixs (especially with DNA/RNA sequences of any decently sized gene). > > So, make sure any parser can cope with both full symmetric, and lower > triangular forms - ideally without the user having to care. Phylip does ask you which to either read or write; this is a pain at times. So, having a parser figure this out would be nice. However, the user should know about the choices. > > This also raises the point about how to store the matrix in memory. > Does Numeric/NumPy have an efficient way of storing symmetric > matrices? > This is less flexible than the suggested list of lists, but for > large > datasets would need much less memory. I believe that SciPy (Numeric/NumPy/etc..) is more efficient at storing these things. But you lose that when you want to do pythonish things to it (like write it back out). > > Second point - the "official" PHYLIP distance matrix file format > truncates the taxa names at 10 characters. Some tools (e.g. clustalw) > ignore this limitation and will use as many as needed for the full > name. ClustalW does the CORRECT thing, it truncates the name to 10 characters for Phylip output (alignments). And it does the CORRECT thing for its distance matrix file. In Clustalw's trees.c file void distance_matrix_output(FILE *ofile) fprintf(ofile,"\n%-*s ",max_names,names[i]); /* left justify to the maximum length of names in current alignment file and use a space as a sep */ spaces in names are bad in this case, but phylip is okay with them, since the first 10 characters are the taxon name. > I personally find this much nicer - after all most gene identifiers > (e.g. GI numbers) are eight characters to start with, and if you are > dealing with multiple features in each gene 10 characters is tough > going. > > So, I would make sure you test the parser on this format variant (with > names longer than 10 characters). I can supply some examples if > you like. By definition this isn't a variant of Phylip, but another format. So, one would need two parsers: PhylipDist and Dist (or ClustalDist). > > For writing matrices to file, the issue of following the strict 10 > character taxa limit might best be handled as an option (default to > max > 10, with a warning if any names are truncated, and an error if > truncation renders names non-unique?). DON'T give an option of 10 or more. That is NOT the definition of the Phylip file Matrix structure, so why give the option? Make another class that outputs the whole name (ClustalDist). I am pretty sure that Phylip doesn't care about non-unique names so why error out? However, the class should have a means for the user to ask this question. > > Likewise an option to save matrices as either fully symmetric or lower > triangular. I would lean towards using fully symmetric as the default > as it seems to be more common. Phylip's default seems to be a "Square" distance matrix, i.e. fully symmetric. Keep this in mind when naming or documentation. > >> and a parser for the MEGA (http://www.megasoftware.net/mega.html) >> distance matrix format. If not, would there be any interest in >> creating parsers for these matrices, other than my own? I think >> parsers for distance matrices could be very useful to the community. > > I suspect that for serious tree building pure python will not be > competitive with existing C/C++ code on speed - but non-the-less could > be useful. > Well, we do have things like SciPy and PyClustal, which make things more even. Marc From mcolosimo at mitre.org Mon Jun 12 09:18:41 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Mon, 12 Jun 2006 09:18:41 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <448A9A7A.6050501@maubp.freeserve.co.uk> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: [cross post] On Jun 10, 2006, at 6:10 AM, Peter wrote: > Chris Lasher wrote: >> Hi all, Are there any modules in BioPython to parse distance >> matrices? My poking around the BioPython modules and Google searching >> does not turn up any signs indicating there are distance matrix >> parsers, currently. Two particularly useful parsers would be a parser >> for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP >> (http://evolution.genetics.washington.edu/phylip.html), > > I've done a very small amount of work with neighbour joining trees, > using PHYLIP format distance matrices. The closest I could find to a > file format definition was this page: > > http://evolution.genetics.washington.edu/phylip/doc/distance.html > > Points to be aware of: > > In my experience, most software tools usually write the distances as a > full symmetric matrix. However, the "standard" explicitly discusses > lower triangular form (missing out the diagonal distance zero entries) > which has the significant advantage of using about half the disk > space. > This is significant once you get into thousands of taxa. This is still small potatoes compared to the input needed to generate the distance matrixs (especially with DNA/RNA sequences of any decently sized gene). > > So, make sure any parser can cope with both full symmetric, and lower > triangular forms - ideally without the user having to care. Phylip does ask you which to either read or write; this is a pain at times. So, having a parser figure this out would be nice. However, the user should know about the choices. > > This also raises the point about how to store the matrix in memory. > Does Numeric/NumPy have an efficient way of storing symmetric > matrices? > This is less flexible than the suggested list of lists, but for > large > datasets would need much less memory. I believe that SciPy (Numeric/NumPy/etc..) is more efficient at storing these things. But you lose that when you want to do pythonish things to it (like write it back out). > > Second point - the "official" PHYLIP distance matrix file format > truncates the taxa names at 10 characters. Some tools (e.g. clustalw) > ignore this limitation and will use as many as needed for the full > name. ClustalW does the CORRECT thing, it truncates the name to 10 characters for Phylip output (alignments). And it does the CORRECT thing for its distance matrix file. In Clustalw's trees.c file void distance_matrix_output(FILE *ofile) fprintf(ofile,"\n%-*s ",max_names,names[i]); /* left justify to the maximum length of names in current alignment file and use a space as a sep */ spaces in names are bad in this case, but phylip is okay with them, since the first 10 characters are the taxon name. > I personally find this much nicer - after all most gene identifiers > (e.g. GI numbers) are eight characters to start with, and if you are > dealing with multiple features in each gene 10 characters is tough > going. > > So, I would make sure you test the parser on this format variant (with > names longer than 10 characters). I can supply some examples if > you like. By definition this isn't a variant of Phylip, but another format. So, one would need two parsers: PhylipDist and Dist (or ClustalDist). > > For writing matrices to file, the issue of following the strict 10 > character taxa limit might best be handled as an option (default to > max > 10, with a warning if any names are truncated, and an error if > truncation renders names non-unique?). DON'T give an option of 10 or more. That is NOT the definition of the Phylip file Matrix structure, so why give the option? Make another class that outputs the whole name (ClustalDist). I am pretty sure that Phylip doesn't care about non-unique names so why error out? However, the class should have a means for the user to ask this question. > > Likewise an option to save matrices as either fully symmetric or lower > triangular. I would lean towards using fully symmetric as the default > as it seems to be more common. Phylip's default seems to be a "Square" distance matrix, i.e. fully symmetric. Keep this in mind when naming or documentation. > >> and a parser for the MEGA (http://www.megasoftware.net/mega.html) >> distance matrix format. If not, would there be any interest in >> creating parsers for these matrices, other than my own? I think >> parsers for distance matrices could be very useful to the community. > > I suspect that for serious tree building pure python will not be > competitive with existing C/C++ code on speed - but non-the-less could > be useful. > Well, we do have things like SciPy and PyClustal, which make things more even. Marc From asmund.skjaveland at usit.uio.no Mon Jun 12 11:45:26 2006 From: asmund.skjaveland at usit.uio.no (=?ISO-8859-1?Q?=C5smund_Skj=E6veland?=) Date: Mon, 12 Jun 2006 17:45:26 +0200 Subject: [BioPython] Generating Nexus file from Genbank file Message-ID: <448D8C16.6050204@fys.uio.no> I have a file of Genbank records, and want to extract some of them and save to a Nexus file. As far as I can tell from the API, this should work: #!/site/compython/Linux/bin/python import Bio, sys, time from Bio.GenBank import Iterator from Bio.Nexus.Nexus import Nexus gbfile='results/sequences-txid34828.genbank' fp = Bio.GenBank.FeatureParser() gb = open(gbfile, 'r') it = Bio.GenBank.Iterator(gb, fp) nex = Nexus() nr = 0; rec = it.next() while rec: # A string to identify the sequence with nexusname=rec.features[0].qualifiers['db_xref'][0] + '--' + rec.name nex.add_sequence(nexusname, rec.seq) rec = it.next() print "\n\n%d records, %d gene names" % (nr, len(genenames)) nex.write_nexus_data('results/genegrab.nex', mrbayes=True) But it doesn't. When I run it: Traceback (most recent call last): File "py_nexustest.py", line 39, in ? nex.add_sequence(nexusname, rec.seq) File "/site/compython/Linux/lib/python2.4/site-packages/Bio/Nexus/Nexus.py", line 1412, in add_sequence self.matrix[name]=Seq(sequence,self.alphabet) AttributeError: 'Nexus' object has no attribute 'alphabet' What am I doing wrong? I don't really know the Nexus format, I just want to send certain sequences to MrBayes. -- ?smund Skj?veland { Scientific Computing Group, UiO; } From rohini.damle at gmail.com Tue Jun 13 15:09:21 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Tue, 13 Jun 2006 12:09:21 -0700 Subject: [BioPython] (no subject) Message-ID: Hi, I am new to bipyton trying to use ncbistandalone parser to parse my blast out put which is in txt format. the parser works well for older blast uptputs but breaks down for newer blast outputs. Can someone suggest me a way to overcome this blast parser's problem? Thanks From winter at biotec.tu-dresden.de Wed Jun 14 04:00:20 2006 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 14 Jun 2006 10:00:20 +0200 Subject: [BioPython] (no subject) In-Reply-To: References: Message-ID: <448FC214.20805@biotec.tu-dresden.de> Hi Rohini, can you provide a minimal example of your python code along with two blast reports (working/not working)? Cheers, Christof Rohini Damle wrote: > Hi, > I am new to bipyton trying to use ncbistandalone parser to parse my blast > out put which is in txt format. > the parser works well for older blast uptputs but breaks down for newer > blast outputs. Can someone suggest me a way to overcome this blast parser's > problem? > Thanks > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Wed Jun 14 05:09:48 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jun 2006 10:09:48 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: References: Message-ID: <448FD25C.20101@maubp.freeserve.co.uk> Rohini Damle wrote: > Hi, > I am new to bipyton trying to use ncbistandalone parser to parse my blast > out put which is in txt format. > the parser works well for older blast uptputs but breaks down for newer > blast outputs. The NCBI standalone blast and web blast plain text output keeps changing slightly, and as a result, the parser isn't always up to date. > Can someone suggest me a way to overcome this blast parser's > problem? We recommend you use the XML output instead (this is possible with both online blast and the standalone tools). For the stand alone tools, repeat your searches with the command line option -m 7 to get XML output. If you are using the Bio.NCBIStandalone.blastall() command, use argument align_view to set this. You still use NCBIStandalone.Iterator (if you have multiple queries) but now use NCBIXML.BlastParser instead of NCBIStandalone.BlastParser e.g. http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view Peter From rohini.damle at gmail.com Wed Jun 14 14:22:59 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Wed, 14 Jun 2006 11:22:59 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <448FD25C.20101@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> Message-ID: Thank you very much for your help. I have 55-56 proteins & I am using Blast to find out short, nearly exact matches. The xml parser works fine for first record but even if I used the iterator, I CAN NOT ITERATE through the records, I have used the same code as u have given, what might be wrong? Rohini. On 6/14/06, Peter wrote: > > Rohini Damle wrote: > > Hi, > > I am new to bipyton trying to use ncbistandalone parser to parse my > blast > > out put which is in txt format. > > the parser works well for older blast uptputs but breaks down for newer > > blast outputs. > > The NCBI standalone blast and web blast plain text output keeps changing > slightly, and as a result, the parser isn't always up to date. > > > Can someone suggest me a way to overcome this blast parser's > > problem? > > We recommend you use the XML output instead (this is possible with both > online blast and the standalone tools). > > For the stand alone tools, repeat your searches with the command line > option -m 7 to get XML output. > > If you are using the Bio.NCBIStandalone.blastall() command, use argument > align_view to set this. > > You still use NCBIStandalone.Iterator (if you have multiple queries) but > now use NCBIXML.BlastParser instead of NCBIStandalone.BlastParser > > e.g. > http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view > > Peter > > From manickam.muthuraman at wur.nl Wed Jun 14 16:22:56 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Wed, 14 Jun 2006 22:22:56 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> I am new to python I am getting error in parsing blastoutput more over the same problem was been addressed by Michiel De Hoon but i could not clear...here is the error what i am getting. first i got error when i typed b_record=b_parser.parse(blast_out) as michiel suggested i changed to b_record=b_parser.parse(blast_out) Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError raise exception SAXParseException: my_blast.out:1:4: not well-formed (invalid token) blast_out=open('my_blast.out','r') from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML b_parser=NCBIXML.BlastParser() b_iterator1=NCBIStandalone.Iterator(blast_out,b_parser) for alignment in b_iterator1.alignments: for hsp in alignment.hsps: print 'seq:',alignment.title Traceback (most recent call last): File "", line 1, in ? AttributeError: Iterator instance has no attribute 'alignments' how do i print the title.alignment and so on.....from the blast output file thanks in advance -- Manickam(melaimanik) From biopython at maubp.freeserve.co.uk Wed Jun 14 17:54:53 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jun 2006 22:54:53 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: References: <448FD25C.20101@maubp.freeserve.co.uk> Message-ID: <449085AD.7010801@maubp.freeserve.co.uk> Rohini Damle wrote: > Thank you very much for your help. > I have 55-56 proteins & I am using Blast to find out short, nearly exact > matches. The xml parser works fine for first record but even if I used the > iterator, I CAN NOT ITERATE through the records, I have used the same code > as u have given, what might be wrong? > Rohini. If you you send us a short be of example code, and the error message that would help. Also, what version of BioPython are you using, and do you have Windows or Linux or MacOS... One guess is that you will need to update the NCBIStandalone.py file to include a recent fix for iterating XML files. Assuming you are using BioPython 1.41 on Windows, the click on this link and pick "download" near the top of the page to get the latest verion: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython Save it here: c:\python24\lib\site-packages\Bio\Blast\NCBIStandalone.py (Make a copy of the old file first, just in case) Peter From mdehoon at c2b2.columbia.edu Wed Jun 14 17:55:17 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Jun 2006 17:55:17 -0400 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> Message-ID: <449085C5.4020101@c2b2.columbia.edu> Muthuraman, Manickam wrote: > b_parser=NCBIXML.BlastParser() > b_iterator1=NCBIStandalone.Iterator(blast_out,b_parser) > for alignment in b_iterator1.alignments: > for hsp in alignment.hsps: > print 'seq:',alignment.title > > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: Iterator instance has no attribute 'alignments' > Use: b_record = b_iterator1.next() for alignment in b_record.alignments: ... Just like the example in the tutorial. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Wed Jun 14 17:48:20 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jun 2006 22:48:20 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> Message-ID: <44908424.2070407@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > I am new to python > > I am getting error in parsing blastoutput more over the same problem > was been addressed by Michiel De Hoon but i could not clear... > > blast_out=open('my_blast.out','r') > from Bio.Blast import NCBIStandalone > from Bio.Blast import NCBIXML > b_parser=NCBIXML.BlastParser() > b_iterator1=NCBIStandalone.Iterator(blast_out,b_parser) > for alignment in b_iterator1.alignments: > for hsp in alignment.hsps: > print 'seq:',alignment.title > Your example code is wrong. The iterator object will return blast record objects (which have an alignments property). Try something like this: blast_out=open('my_blast.out','r') from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML b_parser=NCBIXML.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) for b_record in b_iterator: for alignment in b_record.alignments: for hsp in alignment.hsps: print 'seq:',alignment.title Or for a full and tested example, try this : http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view Peter From rohini.damle at gmail.com Wed Jun 14 14:21:18 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Wed, 14 Jun 2006 11:21:18 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <448FD25C.20101@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> Message-ID: Thank you very much for your help. I have 55-56 proteins & I am using Blast to find out short, nearly exact matches. The xml parser works fine for first record but even if I used the iterator, I On 6/14/06, Peter wrote: > > Rohini Damle wrote: > > Hi, > > I am new to bipyton trying to use ncbistandalone parser to parse my > blast > > out put which is in txt format. > > the parser works well for older blast uptputs but breaks down for newer > > blast outputs. > > The NCBI standalone blast and web blast plain text output keeps changing > slightly, and as a result, the parser isn't always up to date. > > > Can someone suggest me a way to overcome this blast parser's > > problem? > > We recommend you use the XML output instead (this is possible with both > online blast and the standalone tools). > > For the stand alone tools, repeat your searches with the command line > option -m 7 to get XML output. > > If you are using the Bio.NCBIStandalone.blastall() command, use argument > align_view to set this. > > You still use NCBIStandalone.Iterator (if you have multiple queries) but > now use NCBIXML.BlastParser instead of NCBIStandalone.BlastParser > > e.g. > http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view > > Peter > > From manickam.muthuraman at wur.nl Thu Jun 15 07:47:34 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Thu, 15 Jun 2006 13:47:34 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> Still i am getting the same error or error. I tried as Peter suggested but it fails. I have attached the error and the code [manickam at bioinfo python]$ cat blas.py from Bio import Fasta file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r') f_iterator=Fasta.Iterator(file_for_blast) f_record=f_iterator.next() from Bio.Blast import NCBIWWW result_handle=NCBIWWW.qblast('blastp','nr',f_record) save_file=open('/home/manickam/my_blast.out','w') blast_results=result_handle.read() save_file.write(blast_results) save_file.close() blast_out=open('/home/manickam/my_blast.out','r') from Bio.Blast import NCBIXML from Bio.Blast import NCBIStandalone b_parser=NCBIXML.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) for b_record in b_iterator: print "inside (3)outer loop" for alignment in b_record.alignments: print "inside 2 loop" for hsp in alignment.hsps: print "inside 1 loop" print 'seq:',alignment.title blast_out.close() [manickam at bioinfo python]$ [manickam at bioinfo python]$ python blas.py /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py:1064: UserWarning: qblast works only with blastn and blastp for now. warnings.warn("qblast works only with blastn and blastp for now.") Traceback (most recent call last): File "blas.py", line 16, in ? for b_record in b_iterator: File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 1385, in next return self._parser.parse(File.StringHandle(data)) File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:4: not well-formed (invalid token) [manickam at bioinfo python]$ From manickam.muthuraman at wur.nl Thu Jun 15 07:51:36 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Thu, 15 Jun 2006 13:51:36 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> Dear Michiel I tried your suggestion as well but i am getting error. I could even understand where i am making mistake. [manickam at bioinfo python]$ cat blas.py from Bio import Fasta file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r') f_iterator=Fasta.Iterator(file_for_blast) f_record=f_iterator.next() from Bio.Blast import NCBIWWW result_handle=NCBIWWW.qblast('blastp','nr',f_record) save_file=open('/home/manickam/my_blast.out','w') blast_results=result_handle.read() save_file.write(blast_results) save_file.close() blast_out=open('/home/manickam/my_blast.out','r') from Bio.Blast import NCBIXML from Bio.Blast import NCBIStandalone b_parser=NCBIXML.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) b_record = b_iterator.next() for alignment in b_record.alignments: print "inside 2 loop" for hsp in alignment.hsps: print "inside 1 loop" print 'seq:',alignment.title blast_out.close() [manickam at bioinfo python]$ python blas.py /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py:1064: UserWarning: qblast works only with blastn and blastp for now. warnings.warn("qblast works only with blastn and blastp for now.") Traceback (most recent call last): File "blas.py", line 16, in ? b_record = b_iterator.next() File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 1385, in next return self._parser.parse(File.StringHandle(data)) File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:4: not well-formed (invalid token) [manickam at bioinfo python]$ From biopython at maubp.freeserve.co.uk Thu Jun 15 08:25:06 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 13:25:06 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> Message-ID: <449151A2.1040602@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > Still i am getting the same error or error. I tried as Peter suggested but it fails. > ... I couldn't see anything clearly wrong just from reading your code. Which version of BioPython do you have? Since BioPython 1.41 NCBIWWW.qblast uses XML as the default output format, but you can force this by: result_handle=NCBIWWW.qblast('blastp','nr',f_record, format_type="XML") Try opening your output file /home/manickam/my_blast.out in a text editor to double check it really is XML - i.e. does it start If it is XML, then BioPython doesn't like it for some reason. Maybe you could email the file to me and Michiel to take a look? Peter From manickam.muthuraman at wur.nl Thu Jun 15 10:13:17 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Thu, 15 Jun 2006 16:13:17 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> Dear peter here is the code my_blast.out and the error. My need is to get all the blast hit sequences in fasta format. By parsing and i can extract accession number from it. Code from Bio import Fasta file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r') f_iterator=Fasta.Iterator(file_for_blast) f_record=f_iterator.next() from Bio.Blast import NCBIWWW result_handle=NCBIWWW.qblast('blastp','nr',f_record, format_type="XML") save_file=open('/home/manickam/my_blast.out','w') blast_results=result_handle.read() save_file.write(blast_results) save_file.close() blast_out=open('/home/manickam/my_blast.out','r') from Bio.Blast import NCBIXML from Bio.Blast import NCBIStandalone b_parser=NCBIXML.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) b_record = b_iterator.next() for alignment in b_record.alignments: print "inside 2 loop" for hsp in alignment.hsps: print "inside 1 loop" print 'seq:',alignment.title blast_out.close() Error [root at bioinfo python]# python blas.py /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py:1064: UserWarning: qblast works only with blastn and blastp for now. warnings.warn("qblast works only with blastn and blastp for now.") Traceback (most recent call last): File "blas.py", line 16, in ? b_record = b_iterator.next() File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 1410, in next return self._parser.parse(File.StringHandle(data)) File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:4: not well-formed (invalid token) [root at bioinfo python]# my_blast.out HTTP/1.1 200 OK Date: Thu, 15 Jun 2006 13:57:19 GMT Server: Nde Content-Type: application/xml Connection: close blastp BLASTP 2.2.14 [May-07-2006] Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. nr 1_13944 1BK0 331 . . . . . . . . 76 128 27 295 VPKIDVSPLFGD-DQAAKMRVAQQIDAASRDTGFFYAVNHGIN---VQRLSQKTKEFHMSITPEEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNP--NFTPDHPRIQAKTPTHEVNVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVVLIRYP-YLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEADDTGYLINCGSYMAHLTNNYYKAPIHRV--KWVNAERQSLPFFVNLGYDSVI LPVIDLSLLDGSPESAAKFR--DDLLCATHDVGFFYLVGHGVDESLMDDLLAASREFFD--LPEDQKFAVENVKSPQFRGYTRVGGELT-EGKTDWREQIDVGPERDVIDNAPGLADYWRLEGPNLWPDAV--PQLRGLVNEWNDKLSAVSLRLLRAWAHALGAPEDVFDNAFA-DKPFPQLKIVRYPGESNPEPKQGVGAHRDGGVLTL----------LMVEPGKGGLQVDYNGEWVDVPPKPGAFVVNIGEMLELATEGYLKATLHRVISPLIGDDRISIPFFFNPALDTVM +P ID+S L G + AAK R + A+ D GFFY V HG++ + L ++EF PE++ + + + R G L+ GK + P + + P + N+WPD P + ++ + +S LL+ +A ALG E+ F F D + ++RYP +P P+ + DG L+ ++ + LQV+ + D+ +++N G + T Y KA +HRV + +R S+PFF N D+V+ 3695564 1269795892 0 0 0.041 0.267 0.14 From biopython at maubp.freeserve.co.uk Thu Jun 15 11:01:42 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 16:01:42 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> Message-ID: <44917656.6090602@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > Dear peter > > here is the code my_blast.out and the error. My need is to get all the > blast hit sequences in fasta format. By parsing and i can extract > accession number from it. I made an example fasta file containing just this one sequence twice: >example1 VPKIDVSPLFGDDQAAKMRVAQQIDAASRDTGFFYAVNHGINVQRLSQKTKEFHMSITP EEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNPNFTPDHPRIQAKTPTHEV NVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVV LIRYPYLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEA DDTGYLINCGSYMAHLTNNYYKAPIHRVKWVNAERQSLPFFVNLGYDSVI >example2 VPKIDVSPLFGDDQAAKMRVAQQIDAASRDTGFFYAVNHGINVQRLSQKTKEFHMSITP EEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNPNFTPDHPRIQAKTPTHEV NVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVV LIRYPYLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEA DDTGYLINCGSYMAHLTNNYYKAPIHRVKWVNAERQSLPFFVNLGYDSVI I then edited the filenames in your example, and ran the code. It worked for me using a fresh install of BioPython 1.41 on Linux with Python 2.4.2 So the good news is your code seems fine. Maybe there is something "funny" with your fasta file? Accented characters for example - which would then be in the output XML file? Could you send me the fasta file and the XML file (in full, as attachments), off the mailing list to avoid clogging up everyone's inboxes. Thanks Peter From biopython at maubp.freeserve.co.uk Thu Jun 15 11:08:32 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 16:08:32 +0100 Subject: [BioPython] Abuse of the new Wiki Homepage Message-ID: <449177F0.1010209@maubp.freeserve.co.uk> I've noticed someone has created an account "Ceas" on the wiki and has been inserting junk/spam links. For example, look at the history of the main page: http://biopython.org/wiki/Biopython Who is in charge of the Wiki? Can we (a) block this account (short term action) (b) tighten up rules for creating new accounts? Peter From arareko at campus.iztacala.unam.mx Thu Jun 15 12:13:50 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 15 Jun 2006 11:13:50 -0500 Subject: [BioPython] Abuse of the new Wiki Homepage In-Reply-To: <449177F0.1010209@maubp.freeserve.co.uk> References: <449177F0.1010209@maubp.freeserve.co.uk> Message-ID: <4491873E.50509@campus.iztacala.unam.mx> Hi Peter, We started to have the same problem in the BioPerl wiki some months ago. The way we usually solve this is by blocking the user account and rolling back to the previous version of the affected document. We have a list of wiki administrators who are constantly (and independently) monitoring the recent changes in the site. This way we can keep track of the changes and revert damages to the content: http://bioperl.org/wiki/BioPerl:Administrators http://bioperl.org/wiki/Special:Recentchanges You can also keep track of the changes by using the RSS or Atom feeds provided by the Recentchanges page: http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom The wiki system has memory of the blocked users and IP's, you can have a look here: http://bioperl.org/wiki/Special:Ipblocklist There also exists a Blacklist, which is a complement to the main Wikimedia's one and helps detect spam content before it goes into a document: http://bioperl.org/wiki/Help:Blacklist http://meta.wikimedia.org/wiki/Spam_blacklist I don't know who's in charge of BioPython's wiki but I hope this info can be helpful to you. Regards, Mauricio. Peter wrote: > I've noticed someone has created an account "Ceas" on the wiki and has > been inserting junk/spam links. For example, look at the history of the > main page: > > http://biopython.org/wiki/Biopython > > Who is in charge of the Wiki? Can we > (a) block this account (short term action) > (b) tighten up rules for creating new accounts? > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From dag at sonsorol.org Thu Jun 15 12:31:01 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Thu, 15 Jun 2006 12:31:01 -0400 Subject: [BioPython] Abuse of the new Wiki Homepage In-Reply-To: <4491873E.50509@campus.iztacala.unam.mx> References: <449177F0.1010209@maubp.freeserve.co.uk> <4491873E.50509@campus.iztacala.unam.mx> Message-ID: I deal with a number of wiki sites, all of which are subjected to a constant stream of automated spam posters. The single best defense is volunteers who monitor the "Recent Changes" feed and take instant action to rollback the spam changes: http://biopython.org/wiki/Special:Recentchanges People can monitor that page (in web or RSS form) and rollback spam shortly after it happens. It really is the best way. Anyone can roll back changes. If you find yourself doing it often, ask to become a wiki administrator and then you'll be able to blocklist people and IP addresses as well. Behind the scenes we do other things to block spam, including regular expression tests on content, blacklists etc. but it is a constant arms race with the wiki spammers and we are always a bit behind. My $.02 -Chris On Jun 15, 2006, at 12:13 PM, Mauricio Herrera Cuadra wrote: > Hi Peter, > > We started to have the same problem in the BioPerl wiki some months > ago. The way we usually solve this is by blocking the user account > and rolling back to the previous version of the affected document. > > We have a list of wiki administrators who are constantly (and > independently) monitoring the recent changes in the site. This way > we can keep track of the changes and revert damages to the content: > > http://bioperl.org/wiki/BioPerl:Administrators > http://bioperl.org/wiki/Special:Recentchanges > > You can also keep track of the changes by using the RSS or Atom > feeds provided by the Recentchanges page: > > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom > > The wiki system has memory of the blocked users and IP's, you can > have a look here: > > http://bioperl.org/wiki/Special:Ipblocklist > > There also exists a Blacklist, which is a complement to the main > Wikimedia's one and helps detect spam content before it goes into a > document: > > http://bioperl.org/wiki/Help:Blacklist > http://meta.wikimedia.org/wiki/Spam_blacklist > > I don't know who's in charge of BioPython's wiki but I hope this > info can be helpful to you. > > Regards, > Mauricio. > > Peter wrote: >> I've noticed someone has created an account "Ceas" on the wiki and >> has been inserting junk/spam links. For example, look at the >> history of the main page: >> http://biopython.org/wiki/Biopython >> Who is in charge of the Wiki? Can we >> (a) block this account (short term action) >> (b) tighten up rules for creating new accounts? >> Peter >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From jason.stajich at duke.edu Thu Jun 15 12:45:50 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 15 Jun 2006 12:45:50 -0400 Subject: [BioPython] Fwd: Abuse of the new Wiki Homepage References: Message-ID: <29F97001-146E-414A-8E5D-330AEDAB3392@duke.edu> Begin forwarded message: > From: Jason Stajich > Date: June 15, 2006 12:40:13 PM EDT > To: Mauricio Herrera Cuadra > Cc: biopython at biopython.org, Chris Dagdigian , > Chris Fields > Subject: Re: [BioPython] Abuse of the new Wiki Homepage > > I'm not convinced the blacklist is working - but we need to make > sure it is enabled in the conf file on the server. I've locked the > blacklist page as well so that only sysops can edit it. Iddo and > Michiel are the main site admins right now, other people can be > promoted by them or one of the main site admins if we know who you > are. > > I've blocked the previous spammer's account. You can easily revert > changes by using the rollback button on the diff page. > > The biopython community will have to decide how it wants to handle > new accounts to the wiki site. Whether there is patrolling or if > you want to lock the site down. I would encourage all legitimate > users to add something to their User page so that we can have an > easier time distinguishing random account creation from real people. > > -jason > On Jun 15, 2006, at 12:13 PM, Mauricio Herrera Cuadra wrote: > >> Hi Peter, >> >> We started to have the same problem in the BioPerl wiki some >> months ago. The way we usually solve this is by blocking the user >> account and rolling back to the previous version of the affected >> document. >> >> We have a list of wiki administrators who are constantly (and >> independently) monitoring the recent changes in the site. This way >> we can keep track of the changes and revert damages to the content: >> >> http://bioperl.org/wiki/BioPerl:Administrators >> http://bioperl.org/wiki/Special:Recentchanges >> >> You can also keep track of the changes by using the RSS or Atom >> feeds provided by the Recentchanges page: >> >> http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss >> http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom >> >> The wiki system has memory of the blocked users and IP's, you can >> have a look here: >> >> http://bioperl.org/wiki/Special:Ipblocklist >> >> There also exists a Blacklist, which is a complement to the main >> Wikimedia's one and helps detect spam content before it goes into >> a document: >> >> http://bioperl.org/wiki/Help:Blacklist >> http://meta.wikimedia.org/wiki/Spam_blacklist >> >> I don't know who's in charge of BioPython's wiki but I hope this >> info can be helpful to you. >> >> Regards, >> Mauricio. >> >> Peter wrote: >>> I've noticed someone has created an account "Ceas" on the wiki >>> and has been inserting junk/spam links. For example, look at the >>> history of the main page: >>> http://biopython.org/wiki/Biopython >>> Who is in charge of the Wiki? Can we >>> (a) block this account (short term action) >>> (b) tighten up rules for creating new accounts? >>> Peter >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> >> -- >> MAURICIO HERRERA CUADRA >> arareko at campus.iztacala.unam.mx >> Laboratorio de Gen?tica >> Unidad de Morfofisiolog?a y Funci?n >> Facultad de Estudios Superiores Iztacala, UNAM >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From rohini.damle at gmail.com Thu Jun 15 12:36:27 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Thu, 15 Jun 2006 09:36:27 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <449085AD.7010801@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> Message-ID: Hi, I am using BioPython 1.41 on windows I have also updated NcbIstandalone.pyfor the link u gave. here is my code. from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML blast_out = open("4proteinblast.xml","r") b_iterator = NCBIStandalone.Iterator(blast_out, NCBIXML.BlastParser()) for b_record in b_iterator : query_name = b_record.query print query_name for alignment in b_record.alignments: print '****Alignment****' print 'sequence:', alignment.title This code gives "sequences producing significant alignments for all the 4 proteins #but printing querry name as P1 I mean I am getting all the information I want but I have 4 protein querries and this code is giving only P1 as a query (not P2, P3, P4 but giving information about them) I ma attachin the xml file of 4 protein blast results. _thank you for your help. On 6/14/06, Peter wrote: > > Rohini Damle wrote: > > Thank you very much for your help. > > I have 55-56 proteins & I am using Blast to find out short, nearly exact > > matches. The xml parser works fine for first record but even if I used > the > > iterator, I CAN NOT ITERATE through the records, I have used the same > code > > as u have given, what might be wrong? > > Rohini. > > If you you send us a short be of example code, and the error message > that would help. Also, what version of BioPython are you using, and do > you have Windows or Linux or MacOS... > > One guess is that you will need to update the NCBIStandalone.py file to > include a recent fix for iterating XML files. > > Assuming you are using BioPython 1.41 on Windows, the click on this link > and pick "download" near the top of the page to get the latest verion: > > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython > > Save it here: > > c:\python24\lib\site-packages\Bio\Blast\NCBIStandalone.py > > (Make a copy of the old file first, just in case) > > Peter > > -------------- next part -------------- A non-text attachment was scrubbed... Name: 4proteinblast.xml Type: text/xml Size: 98271 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20060615/722b8845/attachment-0001.xml From cjfields at uiuc.edu Thu Jun 15 12:41:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Jun 2006 11:41:05 -0500 Subject: [BioPython] Abuse of the new Wiki Homepage In-Reply-To: <4491873E.50509@campus.iztacala.unam.mx> Message-ID: <000601c6909a$7f4ec5b0$15327e82@pyrimidine> Looks like Jason's doing some work on the BioPython wiki to get it up to speed. I added Help:Blacklist as a start. Like Mauricio said, probably need to get a small group of sysadmins together to keep an eye on things and block potential spammers. Chris > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Thursday, June 15, 2006 11:14 AM > To: biopython at lists.open-bio.org > Cc: biopython at biopython.org; Jason Stajich; Chris Dagdigian; Chris Fields > Subject: Re: [BioPython] Abuse of the new Wiki Homepage > > Hi Peter, > > We started to have the same problem in the BioPerl wiki some months ago. > The way we usually solve this is by blocking the user account and > rolling back to the previous version of the affected document. > > We have a list of wiki administrators who are constantly (and > independently) monitoring the recent changes in the site. This way we > can keep track of the changes and revert damages to the content: > > http://bioperl.org/wiki/BioPerl:Administrators > http://bioperl.org/wiki/Special:Recentchanges > > You can also keep track of the changes by using the RSS or Atom feeds > provided by the Recentchanges page: > > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom > > The wiki system has memory of the blocked users and IP's, you can have a > look here: > > http://bioperl.org/wiki/Special:Ipblocklist > > There also exists a Blacklist, which is a complement to the main > Wikimedia's one and helps detect spam content before it goes into a > document: > > http://bioperl.org/wiki/Help:Blacklist > http://meta.wikimedia.org/wiki/Spam_blacklist > > I don't know who's in charge of BioPython's wiki but I hope this info > can be helpful to you. > > Regards, > Mauricio. > > Peter wrote: > > I've noticed someone has created an account "Ceas" on the wiki and has > > been inserting junk/spam links. For example, look at the history of the > > main page: > > > > http://biopython.org/wiki/Biopython > > > > Who is in charge of the Wiki? Can we > > (a) block this account (short term action) > > (b) tighten up rules for creating new accounts? > > > > Peter > > > > _______________________________________________ > > BioPython mailing list - BioPython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From biopython at maubp.freeserve.co.uk Thu Jun 15 13:30:18 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 18:30:18 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> Message-ID: <4491992A.5040301@maubp.freeserve.co.uk> Rohini Damle wrote: > Hi, > I am using BioPython 1.41 on windows I have also updated > NcbIstandalone.pyfor the link u gave. here is my code. > > from Bio.Blast import NCBIStandalone > from Bio.Blast import NCBIXML > blast_out = open("4proteinblast.xml","r") > b_iterator = NCBIStandalone.Iterator(blast_out, NCBIXML.BlastParser()) > > for b_record in b_iterator : > query_name = b_record.query > print query_name > for alignment in b_record.alignments: > print '****Alignment****' > print 'sequence:', alignment.title > > This code gives "sequences producing significant alignments for all the 4 > proteins but printing querry name as P1 This code does the same thing, but prints less on screen so its easier to read: from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML blast_out = open("4proteinblast.xml","r") b_iterator = NCBIStandalone.Iterator(blast_out, NCBIXML.BlastParser()) for b_record in b_iterator : query_name = b_record.query print query_name for alignment in b_record.alignments: print query_name, alignment.title.split()[0] > I mean I am getting all the information I want but I have 4 protein > querries and this code is giving only P1 as a query (not P2, P3, P4 > but giving information about them) I ma attachin the xml file of > 4 protein blast results. thank you for your help. Looking at the raw XML file by hand, I could only see references to P1, the first protein. If the file had results for all four proteins I would expect to see: ... results for P1 ... ... results for P2 ... ... results for P3 ... ... results for P4 ... Are you sure you gave Blast all four input sequences - and not just the first sequence? Peter From mdehoon at c2b2.columbia.edu Thu Jun 15 13:43:51 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 15 Jun 2006 13:43:51 -0400 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <4491992A.5040301@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> Message-ID: <44919C57.7030204@c2b2.columbia.edu> Peter wrote: > > Looking at the raw XML file by hand, I could only see references to P1, > the first protein. > > If the file had results for all four proteins I would expect to see: > > > ... results for P1 ... > > ... results for P2 ... > > ... results for P3 ... > > ... results for P4 ... > There are results for all four proteins in the XML file, but they look like this: 2 2_20304 p2 ... and so on. Could you let us know how this XML file was generated? --Michiel -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Thu Jun 15 13:53:53 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 18:53:53 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> Message-ID: <44919EB1.1080805@maubp.freeserve.co.uk> I know you haven't got the XML parsing working get - but I thought I should point something else out... Muthuraman, Manickam wrote: > from Bio import Fasta > file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r') > f_iterator=Fasta.Iterator(file_for_blast) > f_record=f_iterator.next() f_record will contain a single fasta record (the first entry in the file m_cold.fasta only). > from Bio.Blast import NCBIWWW > result_handle=NCBIWWW.qblast('blastp','nr',f_record, format_type="XML") This will only run blast on the one record (i.e. the first fasta entry in m_cold.fasta), so the resulting XML file will only have blast results for this protein. I'm not sure if you can use the online NCBI blast (i.e. NCBIWWW.qblast) to submit multiple queries... You might want to install stand alone blast on your own machine - as this will accept multiple inputs. You just tell it to read m_cold.fasta as its input file, and the resulting XML file will contain the results for each sequence in the fasta file. Note that if you know in advance that the XML blast output is from a single input query, you don't need the NCBI iterator. Peter From rohini.damle at gmail.com Thu Jun 15 14:24:38 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Thu, 15 Jun 2006 11:24:38 -0700 Subject: [BioPython] (no subject) Message-ID: > I opened the 'search for short nearly exact match' blast tool then > enterd these prtein sequences > >p1 > FILGIIITV > >p2 > GLFDFVNFV > >p3 > FLIVSLCPT > >p4 > RVYEALYYV > > > Set parameters like evalue and organism and chose the putput format as XML > The output does not contain references for all the 4 proteins inthe > starting but in the block (one block for each protein) > is there any other way to generate the XML formatted output? > -Rohini. From biopython at maubp.freeserve.co.uk Thu Jun 15 14:38:54 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 19:38:54 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <44919C57.7030204@c2b2.columbia.edu> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> Message-ID: <4491A93E.2020306@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Peter wrote: > >>Looking at the raw XML file by hand, I could only see references to P1, >>the first protein. >> >>If the file had results for all four proteins I would expect to see: >> >> >>... results for P1 ... >> >>... results for P2 ... >> >>... results for P3 ... >> >>... results for P4 ... >> > > There are results for all four proteins in the XML file, but they look > like this: > > > 2 > 2_20304 > p2 > ... > > > and so on. Oh yeah. I should have seen that, sorry. According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], maybe they changed the XML format without telling anyone? I couldn't see anything obvious on this page: http://www.ncbi.nlm.nih.gov/blast/blast_whatsnew.shtml This looks like the source code here: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi.tar.gz And you can view their CVS here: http://www.ncbi.nlm.nih.gov/cvsweb/index.cgi/ncbi/algo/blast/ There is nothing in the check-in comments that leaps out at me regarding XML iterations... > > Could you let us know how this XML file was generated? > e.g. Standalone or online? Peter From cariaso at yahoo.com Thu Jun 15 14:39:21 2006 From: cariaso at yahoo.com (Mike Cariaso) Date: Thu, 15 Jun 2006 11:39:21 -0700 (PDT) Subject: [BioPython] Fwd: Abuse of the new Wiki Homepage In-Reply-To: <29F97001-146E-414A-8E5D-330AEDAB3392@duke.edu> Message-ID: <20060615183921.27494.qmail@web52711.mail.yahoo.com> > The biopython community will have to decide how it wants to handle > new accounts to the wiki site. Whether there is patrolling or if > you want to lock the site down. I would encourage all legitimate > users to add something to their User page so that we can have an > easier time distinguishing random account creation from real people. Consider this my vote against any sort of lock down against new users. It can be a real deterent to new contributors, and that is something we sorely need. I'd be more willing to roll back the useless spam, than to risk detering valuable new contributions. Thank you to Maubp for already removing all of Ceas's garbage. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From rohini.damle at gmail.com Thu Jun 15 14:44:38 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Thu, 15 Jun 2006 11:44:38 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <4491A93E.2020306@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> <4491A93E.2020306@maubp.freeserve.co.uk> Message-ID: I used online ncbi blast to generate the xml output Rohini On 6/15/06, Peter wrote: > > Michiel Jan Laurens de Hoon wrote: > > Peter wrote: > > > >>Looking at the raw XML file by hand, I could only see references to P1, > >>the first protein. > >> > >>If the file had results for all four proteins I would expect to see: > >> > >> > >>... results for P1 ... > >> > >>... results for P2 ... > >> > >>... results for P3 ... > >> > >>... results for P4 ... > >> > > > > There are results for all four proteins in the XML file, but they look > > like this: > > > > > > 2 > > 2_20304 > > p2 > > ... > > > > > > and so on. > > Oh yeah. I should have seen that, sorry. > > According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], maybe > they changed the XML format without telling anyone? > > I couldn't see anything obvious on this page: > > http://www.ncbi.nlm.nih.gov/blast/blast_whatsnew.shtml > > This looks like the source code here: > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi.tar.gz > > And you can view their CVS here: > > http://www.ncbi.nlm.nih.gov/cvsweb/index.cgi/ncbi/algo/blast/ > > There is nothing in the check-in comments that leaps out at me regarding > XML iterations... > > > > > Could you let us know how this XML file was generated? > > > > e.g. Standalone or online? > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From cjfields at uiuc.edu Thu Jun 15 12:55:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Jun 2006 11:55:40 -0500 Subject: [BioPython] Abuse of the new Wiki Homepage In-Reply-To: Message-ID: <000701c6909c$88bca480$15327e82@pyrimidine> > I'm not convinced the blacklist is working - but we need to make sure > it is enabled in the conf file on the server. I've locked the > blacklist page as well so that only sysops can edit it. Iddo and > Michiel are the main site admins right now, other people can be > promoted by them or one of the main site admins if we know who you are. Agreed. I actually added the page as 'Help:BlackList' then redirected it to 'Help:Blacklist'; someone with admin privies can delete that redirect link if they want. My oops. Like Jason says, probably doesn't make much of a difference (the wiki version of the raindance, to ward off evil spammers). > I've blocked the previous spammer's account. You can easily revert > changes by using the rollback button on the diff page. > > The biopython community will have to decide how it wants to handle > new accounts to the wiki site. Whether there is patrolling or if you > want to lock the site down. I would encourage all legitimate users > to add something to their User page so that we can have an easier > time distinguishing random account creation from real people. > > -jason > On Jun 15, 2006, at 12:13 PM, Mauricio Herrera Cuadra wrote: > > > Hi Peter, > > > > We started to have the same problem in the BioPerl wiki some months > > ago. The way we usually solve this is by blocking the user account > > and rolling back to the previous version of the affected document. > > > > We have a list of wiki administrators who are constantly (and > > independently) monitoring the recent changes in the site. This way > > we can keep track of the changes and revert damages to the content: > > > > http://bioperl.org/wiki/BioPerl:Administrators > > http://bioperl.org/wiki/Special:Recentchanges > > > > You can also keep track of the changes by using the RSS or Atom > > feeds provided by the Recentchanges page: > > > > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss > > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom > > > > The wiki system has memory of the blocked users and IP's, you can > > have a look here: > > > > http://bioperl.org/wiki/Special:Ipblocklist > > > > There also exists a Blacklist, which is a complement to the main > > Wikimedia's one and helps detect spam content before it goes into a > > document: > > > > http://bioperl.org/wiki/Help:Blacklist > > http://meta.wikimedia.org/wiki/Spam_blacklist > > > > I don't know who's in charge of BioPython's wiki but I hope this > > info can be helpful to you. > > > > Regards, > > Mauricio. > > > > Peter wrote: > >> I've noticed someone has created an account "Ceas" on the wiki and > >> has been inserting junk/spam links. For example, look at the > >> history of the main page: > >> http://biopython.org/wiki/Biopython > >> Who is in charge of the Wiki? Can we > >> (a) block this account (short term action) > >> (b) tighten up rules for creating new accounts? > >> Peter > >> _______________________________________________ > >> BioPython mailing list - BioPython at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biopython > > > > -- > > MAURICIO HERRERA CUADRA > > arareko at campus.iztacala.unam.mx > > Laboratorio de Gen?tica > > Unidad de Morfofisiolog?a y Funci?n > > Facultad de Estudios Superiores Iztacala, UNAM > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From manickam.muthuraman at wur.nl Thu Jun 15 17:29:51 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Thu, 15 Jun 2006 23:29:51 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> Dear Peter In this mail i am attaching three files :seq file,python script file and the blast output. I am using python Python 2.4.1 (#2, Aug 25 2005, 18:20:57)and biopython 1.40 i spent almost the whole evening to upgarde the python and biopython in mandriva linux but i failed. let me know is the version of python and biopython matter here thanks for helping me out of this from manickam -----Original Message----- From: Peter [mailto:biopython at maubp.freeserve.co.uk] Sent: Thu 6/15/2006 5:01 PM To: Muthuraman, Manickam Cc: biopython at lists.open-bio.org Subject: Re: [BioPython] parsing the blastoutput and printing the alingment Muthuraman, Manickam wrote: > Dear peter > > here is the code my_blast.out and the error. My need is to get all the > blast hit sequences in fasta format. By parsing and i can extract > accession number from it. I made an example fasta file containing just this one sequence twice: >example1 VPKIDVSPLFGDDQAAKMRVAQQIDAASRDTGFFYAVNHGINVQRLSQKTKEFHMSITP EEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNPNFTPDHPRIQAKTPTHEV NVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVV LIRYPYLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEA DDTGYLINCGSYMAHLTNNYYKAPIHRVKWVNAERQSLPFFVNLGYDSVI >example2 VPKIDVSPLFGDDQAAKMRVAQQIDAASRDTGFFYAVNHGINVQRLSQKTKEFHMSITP EEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNPNFTPDHPRIQAKTPTHEV NVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVV LIRYPYLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEA DDTGYLINCGSYMAHLTNNYYKAPIHRVKWVNAERQSLPFFVNLGYDSVI I then edited the filenames in your example, and ran the code. It worked for me using a fresh install of BioPython 1.41 on Linux with Python 2.4.2 So the good news is your code seems fine. Maybe there is something "funny" with your fasta file? Accented characters for example - which would then be in the output XML file? Could you send me the fasta file and the XML file (in full, as attachments), off the mailing list to avoid clogging up everyone's inboxes. Thanks Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 164714 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20060615/2f04b211/attachment-0001.bin From mdehoon at c2b2.columbia.edu Thu Jun 15 18:37:18 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 15 Jun 2006 18:37:18 -0400 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <4491A93E.2020306@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> <4491A93E.2020306@maubp.freeserve.co.uk> Message-ID: <4491E11E.5020705@c2b2.columbia.edu> Peter wrote: > According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], maybe > they changed the XML format without telling anyone? > It appears that the XML format did change. With Blastp 2.2.14, multiple searches generate multiple ... blocks, one for each search. With an older Blastp, multiple searches effectively generate multiple XML files (each with one ... block). These files are then concatenated into one output file. Biopython then parses this file by looking for the beginning of each XML file in this output file. The new output is in a sense better because the output file is a valid XML file. It may be that Biopython's XML parser ignores the tags, since in the old format there was only one block anyway, and therefore fails with the new format. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Thu Jun 15 18:31:59 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 23:31:59 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> Message-ID: <4491DFDF.9070506@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > Dear Peter > > In this mail i am attaching three files :seq file,python script file > and the blast output. I am using python Python 2.4.1 (#2, Aug 25 > 2005, 18:20:57)and biopython 1.40 Your attachment came as a weird winmail.dat file - something Outlook and the Microsoft Exchange Client sometimes does. There is a Linux tool to "unzip" the file called tnef, which I installed on Ubuntu with a simple "apt-get install tnef" Anyway, the problem is simply that your XML file has this little HTTP header at the start: HTTP/1.1 200 OK Date: Thu, 15 Jun 2006 21:23:08 GMT Server: Nde Content-Type: application/xml Connection: close If you edit the file to remove this, the BioPython can read the file fine. Looking over my old email, Michiel de Hoon checked in a fix from Alexander Morgan for this in March. You need to update this file: /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py Latest code is available here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIWWW.py?cvsroot=biopython It also gets rid of this annoying message: UserWarning: qblast works only with blastn and blastp for now. Peter From manickam.muthuraman at wur.nl Fri Jun 16 10:27:00 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Fri, 16 Jun 2006 16:27:00 +0200 Subject: [BioPython] Running Blast locally References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> Dear peter In the last mail i said that b_record is none , so i tried to run the blastall in my local computer and it works right now. here is the command : ./blastall -d db/swissprot -i /home/manickam/Documents/m_cold.fasta -p blastp and i am getting the result. so let me know if i need to put this command in string and pass this string (example:my_blast_exe). Still i want to know how to pass the input file(my_blast_file). i think i confuse myself let me know your view for this from manickam From winter at biotec.tu-dresden.de Fri Jun 16 10:35:56 2006 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Fri, 16 Jun 2006 16:35:56 +0200 Subject: [BioPython] Running Blast locally In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> Message-ID: <4492C1CC.4020607@biotec.tu-dresden.de> Dear Manickam, Can you try blastall -V T -d db/swissprot -i /home/manickam/Documents/m_cold.fasta -p blastp instead? Christof Muthuraman, Manickam wrote: > Dear peter > > In the last mail i said that b_record is none , so i tried to run the blastall in my local computer and it works right now. > > here is the command : > ./blastall -d db/swissprot -i /home/manickam/Documents/m_cold.fasta -p blastp > and i am getting the result. so let me know if i need to put this command in string and pass this string (example:my_blast_exe). Still i want to know how to pass the input file(my_blast_file). > > i think i confuse myself > let me know your view for this > from > manickam > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Christof Winter Bioinformatics Group TU Dresden Tatzberg 47-51 01307 Dresden, Germany From manickam.muthuraman at wur.nl Fri Jun 16 10:52:15 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Fri, 16 Jun 2006 16:52:15 +0200 Subject: [BioPython] Running Blast locally References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> <4492C1CC.4020607@biotec.tu-dresden.de> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFAC@salte0008.wurnet.nl> Dear Christof Your command also works separately but my question was how to intergrate blast in biopython script. in biopython tutorial and cookbook they have the follwoing code where i need to provide the path to database ,file to blast and blast_exe. I am not clear how to set the path for seq_file,db and exe. import os my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') here is the whole script import os my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') from Bio.Blast import NCBIStandalone blast_out,error_info=NCBIStandalone.blastall(my_blast_exe,'blastp',my_blast_db,my_blast_file) b_parser=NCBIStandalone.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) b_record=b_iterator.next() while 1: b_record=b_iterator.next() if b_record is None: break for alignment in b_record.alignments: print "inside 2 loop" for hsp in alignment.hsps: print "inside 1 loop" print 'seq:',alignment.title it runs but b_record is None so it comes out of the while loop at first time itself. so it mean i am not getting out put of the blast. from manickam From manickam.muthuraman at wur.nl Fri Jun 16 04:42:08 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Fri, 16 Jun 2006 10:42:08 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFA7@salte0008.wurnet.nl> Thanks peter After overwriting the NCBIWWW.py header file my script works. once again i would like to thank from manickam -----Original Message----- From: Peter [mailto:biopython at maubp.freeserve.co.uk] Sent: Fri 6/16/2006 12:31 AM To: Muthuraman, Manickam Cc: biopython at lists.open-bio.org Subject: Re: [BioPython] parsing the blastoutput and printing the alingment Muthuraman, Manickam wrote: > Dear Peter > > In this mail i am attaching three files :seq file,python script file > and the blast output. I am using python Python 2.4.1 (#2, Aug 25 > 2005, 18:20:57)and biopython 1.40 Your attachment came as a weird winmail.dat file - something Outlook and the Microsoft Exchange Client sometimes does. There is a Linux tool to "unzip" the file called tnef, which I installed on Ubuntu with a simple "apt-get install tnef" Anyway, the problem is simply that your XML file has this little HTTP header at the start: HTTP/1.1 200 OK Date: Thu, 15 Jun 2006 21:23:08 GMT Server: Nde Content-Type: application/xml Connection: close If you edit the file to remove this, the BioPython can read the file fine. Looking over my old email, Michiel de Hoon checked in a fix from Alexander Morgan for this in March. You need to update this file: /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py Latest code is available here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIWWW.py?cvsroot=biopython It also gets rid of this annoying message: UserWarning: qblast works only with blastn and blastp for now. Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3991 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20060616/490186bf/attachment-0001.bin From manickam.muthuraman at wur.nl Fri Jun 16 09:12:08 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Fri, 16 Jun 2006 15:12:08 +0200 Subject: [BioPython] Running Blast locally References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> Dear Peter i am not clear about the subtopic running blast locally let me explain in detail i have blast executable files in my home directory i.e /home/manickam/blast/blastall i have my database files of nr,swissprot,pdb in /usr/junk/ the files which i can see under /usr/junk/ folder are nr.00.phr nr.00.ppi nr.01.phr nr.01.ppi nr.pal pdbaa.00.msk lot in there and there extenstions are *.phr , ppi ,pal,msk,psq i am not clear from the manual where do i need to provide the input sequences and how to i store the out put after running the local blast. below is the following code which i tried and it works but b_record is none. mport os my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') from Bio.Blast import NCBIStandalone blast_out,error_info=NCBIStandalone.blastall(my_blast_exe,'blastp',my_blast_db,my_blast_file) b_parser=NCBIStandalone.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) b_record=b_iterator.next() while 1: b_record=b_iterator.next() if b_record is None: break for alignment in b_record.alignments: for hsp in alignment.hsps: print 'seq:',alignment.title from manickam -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3446 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20060616/2de84992/attachment-0001.bin From biopython at maubp.freeserve.co.uk Fri Jun 16 11:53:31 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 16 Jun 2006 16:53:31 +0100 Subject: [BioPython] Running Blast locally In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFAC@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> <4492C1CC.4020607@biotec.tu-dresden.de> <4CDD243B32D07748944828EA7A29E4A3E2AFAC@salte0008.wurnet.nl> Message-ID: <4492D3FB.1040706@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > Dear Christof > > Your command also works separately but my question was how to intergrate blast in biopython script. > > in biopython tutorial and cookbook they have the follwoing code where i need to provide the path to database ,file to blast and blast_exe. > > I am not clear how to set the path for seq_file,db and exe. > > import os > my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') > my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') > my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') Try typing this at the python prompt: import os help(os.path.join) Are you familiar with relative paths etc? You might find something like this easier to understand: my_blast_db = '/home/manickam/db/at-est/a-cds-10-7.fasta') my_blast_file = '/home/manickam/sorghum_est-test.fasta') my_blast_exe = '/home/manickam/blast/blastall' Or, based on you previous email you were using: > here is the command : > ./blastall -d db/swissprot -i /home/manickam/Documents/m_cold.fasta > -p blastp Maybe something like this: my_blast_db = '/home/manickam/blast/db/swissprot') my_blast_file = '/home/manickam/Documents/m_cold.fasta') my_blast_exe = '/home/manickam/blast/blastall' It all depends on where you installed the blast program, where you put the blast databases, and where you are going to have your inputfile. > here is the whole script 01> import os 02> my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') 03> my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') 04> my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') 05> from Bio.Blast import NCBIStandalone 06> blast_out,error_info=NCBIStandalone.blastall(my_blast_exe,'blastp',my_blast_db,my_blast_file) At this point, some example scripts will save the output to a file, and then reload it and carry on. This is very helpful if you have problems because you can open the file by hand and look at it. 07> b_parser=NCBIStandalone.BlastParser() 08> b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) 09> b_record=b_iterator.next() 10> while 1: 11> b_record=b_iterator.next() 12> if b_record is None: 13> break 14> for alignment in b_record.alignments: 15> print "inside 2 loop" 16> for hsp in alignment.hsps: 17> print "inside 1 loop" 18> print 'seq:',alignment.title > > it runs but b_record is None so it comes out of the while loop at first time itself. so it mean i am not getting out put of the blast. Notice that at line 9, you set b_record to the first set of results (i.e. from the first sequence in your FASTA file). Then, inside the look, at line 11 set b_record to the SECOND set of results and try and look at it. I suggest you comment out line 9, and it should work better. Finally, this code is using the "plain text" blast output, which can sometimes cause BioPython trouble. I would recommend the XML parser but as you might know from the mailing list, it looks like they have changed the file format for multiple results in XML output... Peter From biopython at maubp.freeserve.co.uk Fri Jun 16 12:06:14 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 16 Jun 2006 17:06:14 +0100 Subject: [BioPython] Running Blast locally In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> Message-ID: <4492D6F6.5060100@maubp.freeserve.co.uk> I didn't see this email - they arrived out of order at my computer. Please also read my longer reply... Muthuraman, Manickam wrote: > i have blast executable files in my home directory i.e > /home/manickam/blast/blastall Then use this: my_blast_exe='/home/manickam/blast/blastall' > i have my database files of nr,swissprot,pdb in /usr/junk/ > > the files which i can see under /usr/junk/ folder are > nr.00.phr > nr.00.ppi > nr.01.phr > nr.01.ppi nr.pal > pdbaa.00.msk > > lot in there and there extenstions are *.phr , ppi ,pal,msk,psq I think you should use one of these, but I haven't checked this: my_blast_db='/usr/junk/nr' my_blast_db='/usr/junk/swissprot' my_blast_db='/usr/junk/pdb' > i am not clear from the manual where do i need to provide the input sequences The input fasta file can be anywhere - you just have to tell Blast where it is. e.g. my_blast_file='/home/manickam/Documents/m_cold.fasta') > and how to i store the out put after running the local blast. If you run blast "by hand" at the command prompt, use the option -o outputfilename (that is a lower case letter o, not zero, not uppercase). You can also using python to write the results to a file. > below is the following code which i tried and it works but b_record is none. See my other email Peter From gvwilson at cs.utoronto.ca Sun Jun 18 14:15:04 2006 From: gvwilson at cs.utoronto.ca (Greg Wilson) Date: Sun, 18 Jun 2006 14:15:04 -0400 Subject: [BioPython] ann: open source course on basic software development skills Message-ID: http://www.third-bit.com/swc is an open source course on basic software development skills, aimed primarily at people with backgrounds in science, engineering, and medicine who have little formal training in programming, but find themselves doing a lot of it. The course was developed in part through support from the Python Software Foundation; all of the material can be used and modified free of charge (but with attribution). If you have questions, would like to contribute material, or have a success story you'd like to share, please contact Greg Wilson (gvwilson at cs.utoronto.ca). Thanks, Greg From rohini.damle at gmail.com Mon Jun 19 19:36:36 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Mon, 19 Jun 2006 16:36:36 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <4491E11E.5020705@c2b2.columbia.edu> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> <4491A93E.2020306@maubp.freeserve.co.uk> <4491E11E.5020705@c2b2.columbia.edu> Message-ID: So what do one need to do to make biopython working? Make changes in the XML parser so that it will consider one iteration for one result out put? -Rohini On 6/15/06, Michiel Jan Laurens de Hoon wrote: > > Peter wrote: > > According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], maybe > > they changed the XML format without telling anyone? > > > It appears that the XML format did change. > With Blastp 2.2.14, multiple searches generate multiple > ... blocks, one for each search. > With an older Blastp, multiple searches effectively generate multiple > XML files (each with one ... block). These files > are then concatenated into one output file. Biopython then parses this > file by looking for the beginning of each XML file in this output file. > > The new output is in a sense better because the output file is a valid > XML file. It may be that Biopython's XML parser ignores the > tags, since in the old format there was only one block > anyway, and therefore fails with the new format. > > --Michiel. > > -- > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1130 St Nicholas Avenue > New York, NY 10032 > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From biopython at maubp.freeserve.co.uk Tue Jun 20 09:52:48 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 20 Jun 2006 14:52:48 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> <4491A93E.2020306@maubp.freeserve.co.uk> <4491E11E.5020705@c2b2.columbia.edu> Message-ID: <4497FDB0.1000903@maubp.freeserve.co.uk> Peter wrote: >>> According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], >>> maybe they changed the XML format without telling anyone? Michiel wrote: >>It appears that the XML format did change. >>With Blastp 2.2.14, multiple searches generate multiple >>... blocks, one for each search. >>With an older Blastp, multiple searches effectively generate multiple >>XML files (each with one ... block). These files >>are then concatenated into one output file. Biopython then parses this >>file by looking for the beginning of each XML file in this output file. >> >>The new output is in a sense better because the output file is a valid >>XML file. It may be that Biopython's XML parser ignores the >>tags, since in the old format there was only one block >>anyway, and therefore fails with the new format. Rohini Damle wrote: > So what do one need to do to make biopython working? Make changes in > the XML parser so that it will consider one iteration for one result > output? Basically, yes, we need to change the BioPython NCBI Blast XML code somehow - this might be best moved to the development mailing list. Some relevant but probably slightly out of data documentation: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/xml/README.blxml Notice this appears to describe the ... block as follows: BlastOutput_iter-num: the psi-blast iteration number (optional) So whatever we do, we should have a look at the psi-blast output as well... One idea I was thinking about is to modify the existing Blast XML parser to specify WHICH iteratation number it should parse (ignoring the rest). An invalid iteration number would throw a new exception error. Then, a new Blast XML iterator would call the parser repeatedly incrementing the iteration number until the "invalid iteration number" error was raised, which would signal the end. Note that with the "old style concatenated XML entries" we could parse each entry one by one, without having to load the entire XML file into memory at once. I don't think that will be possible with the new style XML files. Peter From biopython at maubp.freeserve.co.uk Wed Jun 21 10:27:06 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 21 Jun 2006 15:27:06 +0100 Subject: [BioPython] docs have moved on the website Message-ID: <4499573A.5060409@maubp.freeserve.co.uk> I don't know if anyone has noticed this, but for example this: http://www.biopython.org/docs/cookbook/genbank_to_fasta.html Has moved to here: http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html Is it too late to revert to the old position? If it is, to preserve any old links from external sites (and also to save google and other search engines having to update their indexes) maybe the website could automatically forward queries for: http://www.biopython.org/docs/* to: http://www.biopython.org/DIST/docs/* Good idea? Bad idea? Peter From rohini.damle at gmail.com Wed Jun 21 15:06:29 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Wed, 21 Jun 2006 12:06:29 -0700 Subject: [BioPython] Biopython's XMl parser fails with NCBI blast changed XML output format Message-ID: Hi, I am trying to parse the blast output (XML formatted, using online NCBI's blast) I got as a result for 'short nearly exact matches' for my 50-55 short protein sequences. It looks like the XML format has changed and biopython's XML parser fails to parse the blast records. can somebody show a way to fix this thing? Thank you Rohini Damle From biopython at maubp.freeserve.co.uk Sun Jun 25 17:37:53 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 25 Jun 2006 22:37:53 +0100 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: <449F0231.2050308@maubp.freeserve.co.uk> [Off topic, but recently has anyone else get valid messages bounced due to a "suspicious header"?] Hello List, I recently wanted to load a "PHYLIP distance matrix file" created by clustalw for my own research... As discussed earlier, clustalw bends the official PHYLIP specification by not truncating long names to 10 characters. For my dataset I need the long names to avoid ambiguity. The attached code implements a fairly simple distance matrix class and associated code to read (parse) and write PHYLIP style distance matrices. There are options to control strict 10 character name truncation, and the separator character(s) when writing files. Internally, I store the distances as a list of lists (of different lengths) to mimic a lower triangular matrix. For example, this matrix: [[0.0, 0.1, 0.2], [0.1, 0.0, 0.5], [0.2, 0.5, 0.0]] Is stored as this: [[], [0.1], [0.2, 0.5]] This may not be the best way to do this in terms of speed and memory usage. There are some simple test cases included, but I have pushed the code very far and there may be problems. Anyway - in case anyone is interested either in the short term, or for ideas for how BioPython could support these files - here it is. I'm sure someone more familiar with arrays (Numeric and NumPy) would be able to make the class act more like an array - but the basics are there. As far as I could see, neither Numeric or NumPy have a specific symmetric matrix / symmetric array class which would be ideal. Members of the list are welcome to use the code, but please contact me before re-distributing it to anyone else. Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: phylip_dst.py Type: text/x-python Size: 16528 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20060625/8d20b314/attachment-0001.py From chris.lasher at gmail.com Tue Jun 27 17:34:37 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 27 Jun 2006 17:34:37 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <449F0231.2050308@maubp.freeserve.co.uk> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> <449F0231.2050308@maubp.freeserve.co.uk> Message-ID: <128a885f0606271434v4d5a40e9x1ceb0037d750f6a1@mail.gmail.com> Hi Peter, Would you be up for licensing your code under the BioPython license? If not, I shouldn't look at it, as I've started coding my own module for the project. From your description, your module sounds very good. =-) Chris On 6/25/06, Peter wrote: > [Off topic, but recently has anyone else get valid messages bounced due > to a "suspicious header"?] > > Hello List, > > I recently wanted to load a "PHYLIP distance matrix file" created by > clustalw for my own research... > > As discussed earlier, clustalw bends the official PHYLIP specification > by not truncating long names to 10 characters. For my dataset I need > the long names to avoid ambiguity. > > The attached code implements a fairly simple distance matrix class and > associated code to read (parse) and write PHYLIP style distance matrices. > > There are options to control strict 10 character name truncation, and > the separator character(s) when writing files. > > Internally, I store the distances as a list of lists (of different > lengths) to mimic a lower triangular matrix. > > For example, this matrix: > > [[0.0, 0.1, 0.2], > [0.1, 0.0, 0.5], > [0.2, 0.5, 0.0]] > > Is stored as this: > > [[], [0.1], [0.2, 0.5]] > > This may not be the best way to do this in terms of speed and memory usage. > > There are some simple test cases included, but I have pushed the code > very far and there may be problems. Anyway - in case anyone is > interested either in the short term, or for ideas for how BioPython > could support these files - here it is. > > I'm sure someone more familiar with arrays (Numeric and NumPy) would be > able to make the class act more like an array - but the basics are there. > > As far as I could see, neither Numeric or NumPy have a specific > symmetric matrix / symmetric array class which would be ideal. > > Members of the list are welcome to use the code, but please contact me > before re-distributing it to anyone else. > > Peter > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > > From biopython at maubp.freeserve.co.uk Tue Jun 27 18:33:34 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 27 Jun 2006 23:33:34 +0100 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606271434v4d5a40e9x1ceb0037d750f6a1@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> <449F0231.2050308@maubp.freeserve.co.uk> <128a885f0606271434v4d5a40e9x1ceb0037d750f6a1@mail.gmail.com> Message-ID: <44A1B23E.5080007@maubp.freeserve.co.uk> Chris Lasher wrote: > Hi Peter, > > Would you be up for licensing your code under the BioPython license? > If not, I shouldn't look at it, as I've started coding my own module > for the project. From your description, your module sounds very good. > =-) > > Chris I am quite happy to contribute the code to BioPython under the appropriate license, so please go ahead. I've filled a bug on adding PHYLIP distance parsers to BioPython and attached a slightly revised version of the code (added "fuzzy" equality testing of matrices - mainly for testing): http://bugzilla.open-bio.org/show_bug.cgi?id=2034 If anyone else really wants the code under some other license (GPL maybe) I could probably be persuaded. Peter From chris.lasher at gmail.com Tue Jun 27 19:32:12 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 27 Jun 2006 19:32:12 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <44A1B23E.5080007@maubp.freeserve.co.uk> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> <449F0231.2050308@maubp.freeserve.co.uk> <128a885f0606271434v4d5a40e9x1ceb0037d750f6a1@mail.gmail.com> <44A1B23E.5080007@maubp.freeserve.co.uk> Message-ID: <128a885f0606271632q2988f2d7y543dd441535f9808@mail.gmail.com> [Oops! I didn't realize I was posting to the user list! Reverting it back to BP-Dev] This code looks very good, Peter! As far as licensing, I'm new to the game, but my guess is the BioPython license (http://www.biopython.org/DIST/LICENSE ) is highly prefered for BioPython. You still retain copyright with the license, but the code is more "free" than under any version of the GPL. Chris On 6/27/06, Peter wrote: > Chris Lasher wrote: > > Hi Peter, > > > > Would you be up for licensing your code under the BioPython license? > > If not, I shouldn't look at it, as I've started coding my own module > > for the project. From your description, your module sounds very good. > > =-) > > > > Chris > > I am quite happy to contribute the code to BioPython under the > appropriate license, so please go ahead. > > I've filled a bug on adding PHYLIP distance parsers to BioPython and > attached a slightly revised version of the code (added "fuzzy" equality > testing of matrices - mainly for testing): > > http://bugzilla.open-bio.org/show_bug.cgi?id=2034 > > If anyone else really wants the code under some other license (GPL > maybe) I could probably be persuaded. > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From cjfields at uiuc.edu Wed Jun 28 14:30:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Jun 2006 13:30:44 -0500 Subject: [BioPython] Wiki spammed Message-ID: <005201c69ae0$f78c59c0$15327e82@pyrimidine> Guys, Just wanted to let whoever's in charge know that you need to roll back changes to this page: http://biopython.org/wiki/Biopython The spammers have struck again! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From mdehoon at c2b2.columbia.edu Fri Jun 2 00:57:35 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 01 Jun 2006 17:57:35 -0700 Subject: [BioPython] NCBIWWW.qblast with refseq by organism In-Reply-To: <20060526165711.94194.qmail@web51708.mail.yahoo.com> References: <20060526165711.94194.qmail@web51708.mail.yahoo.com> Message-ID: <447F8CFF.9050204@c2b2.columbia.edu> Denil Wickrama wrote: > Hi, I would like to BLAST a list of proteins against the refseq > database and retrieve the corresponding accession numbers of the > exact hits. I get errors when I change from the nr database to the > refseq database. Also I am trying to restrict the results by organism > name, but that was not successful. > result_handle = NCBIWWW.qblast("blastp", "nr", seq, entrez_query='"rattus norvegicus" > [Organism]') > result_handle = NCBIWWW.qblast("blastp", "refseq", seq, entrez_query='"rattus norvegicus" [Organism]') > Is it possible to do refseq searches with NCBIWWW.qblast? It turns out that the NCBI server actually wants "refseq_protein" instead of "refseq". (You can check this by saving NCBI's Protein-protein blast page in HTML, and looking at the source). So if you replace "refseq" by "refseq_protein", your code should run. Restricting the results by organism worked fine for me with the entrez_query you have. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Fri Jun 2 01:12:57 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 01 Jun 2006 18:12:57 -0700 Subject: [BioPython] NCBIWWW.qblast In-Reply-To: <20060531114048.83077.qmail@web36813.mail.mud.yahoo.com> References: <20060531114048.83077.qmail@web36813.mail.mud.yahoo.com> Message-ID: <447F9099.1040800@c2b2.columbia.edu> Try this instead: from Bio import Fasta file_for_blast = open('fasta', 'r') f_iterator = Fasta.Iterator(file_for_blast) from Bio.Blast import NCBIWWW seqnum = 0 for f_record in f_iterator: result_handle = NCBIWWW.qblast('blastp', 'nr', f_record) save_file = open('my_blast'+str(seqnum)+'.out', 'w') blast_results = result_handle.read() save_file.write(blast_results) save_file.close() seqnum += 1 --Michiel. alper soyler wrote: > Dear All, > > I have a fasta file (called fasta) containing 20 proteins. I want to blast them in an order. How can I write the results of these 20 proteins in different output files. I tried to write the below script but the 'my_blast2.out' file turned empty. Can you help me please? > > regards, > Alper > > #!usr/local/bin/python > > from Bio import Fasta > file_for_blast = open('fasta', 'r') > f_iterator = Fasta.Iterator(file_for_blast) > f_record = f_iterator.next() > > from Bio.Blast import NCBIWWW > result_handle = NCBIWWW.qblast('blastp', 'nr', f_record) > > seqnum = 0 > > for f_record in f_iterator: > save_file = open('my_blast.out', 'w') > blast_results = result_handle.read() > save_file.write(blast_results) > save_file.close() > seqnum += 1 > save_file2 = open('my_blast2.out', 'w') > blast_results = result_handle.read() > save_file2.write(blast_results) > save_file2.close() > > --------------------------------- > Be a chatter box. Enjoy free PC-to-PC calls with Yahoo! Messenger with Voice. > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From omid9dr18 at hotmail.com Thu Jun 1 22:39:34 2006 From: omid9dr18 at hotmail.com (Omid Khalouei) Date: Thu, 1 Jun 2006 22:39:34 +0000 Subject: [BioPython] Synthesized or Clinical PDB sequence Message-ID: Hello, Is there any way to find out if a sequence corresponding to a PDB structure was obtained clinically or was synthesized without having to read the primary citations? Thanks for your help. Omid K. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ From boris.steipe at utoronto.ca Fri Jun 2 02:25:48 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu, 1 Jun 2006 22:25:48 -0400 Subject: [BioPython] Synthesized or Clinical PDB sequence In-Reply-To: References: Message-ID: Since the PDB does not use a constrained vocabulary, this is a bit unreliable. But the information is supposed to be entered in the SOURCE record. cf.: http://www.rcsb.org/pdb/file_formats/pdb/pdbguide2.2/part_20.html HTH, Boris On 1 Jun 2006, at 18:39, Omid Khalouei wrote: > Hello, > > Is there any way to find out if a sequence corresponding to a PDB > structure was obtained clinically or was synthesized without having > to read the primary citations? > > Thanks for your help. > Omid K. > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today it's > FREE! > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From lee.byung-chul at kaist.ac.kr Fri Jun 2 09:45:09 2006 From: lee.byung-chul at kaist.ac.kr (Lee, Byung-chul) Date: Fri, 02 Jun 2006 18:45:09 +0900 Subject: [BioPython] Drawing Ramanchandran plot Message-ID: <448008A5.8090602@kaist.ac.kr> Hi all, During calculating the torsion angles of some atoms in PDB files, I want to draw the Ramanchandran plot of those. However, I cannot find any modules or methods of doing that in Bio.PDB, so if anyone knows where it is os how to make it, please inform me. Thanks, Byung-chul. -- -------------------------------------------------------- The important thing is not to stop questioning. : Albert Einstein Byung chul Lee a member of Protein BioInformatics Lab. (PBIL) at Detp. BioSystems KAIST, Korea Ph.D candidate 82-42-869-4357 -------------------------------------------------------- From biopython at maubp.freeserve.co.uk Fri Jun 2 12:15:25 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 02 Jun 2006 13:15:25 +0100 Subject: [BioPython] Drawing Ramanchandran plot In-Reply-To: <448008A5.8090602@kaist.ac.kr> References: <448008A5.8090602@kaist.ac.kr> Message-ID: <44802BDD.6080703@maubp.freeserve.co.uk> Lee, Byung-chul wrote: > Hi all, > > During calculating the torsion angles of some atoms in PDB files, I want > to draw the Ramanchandran plot of those. > However, I cannot find any modules or methods of doing that in Bio.PDB, > so if anyone knows where it is os how to make it, please inform me. > > Thanks, > Byung-chul. > A work in progress: http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/ramachandran/ Short summary about calculating the angles: * MMTK is great, providing it can load the PDB file. Very very easy to get the angles * BioPython's Bio.PDB will load most/al PDB files, but you have to work out the backbone and angles yourself. * Python Macromolecular Library (mmLib) might also be worth looking at. Once you have the angles, you will want to draw the plots - the link above suggests a package like Excel, R, or Peter Robinson's Java Program: http://www.charite.de/ch/medgen/compgen/ramachandran/ Peter From sbassi at gmail.com Wed Jun 7 19:25:44 2006 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 7 Jun 2006 16:25:44 -0300 Subject: [BioPython] From REF to sequence? Message-ID: Hello, I have a list like this: >ref|NP_918285.1| >dbj|BAD88119.1| >dbj|BAD88118.1| >ref|XP_475495.1| >emb|CAD37200.1| >gb|AAM64572.1| (the list is much bigger, but with this sample you could get the idea). I would like to create an URL from each entry to retrieve the full NCBI information about these sequence. Is there a Biopython method for doing this? I read once about a NCBI syntaxis to build URLs, but I can't find it. Best regards, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From chris.lasher at gmail.com Thu Jun 8 21:32:26 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Thu, 8 Jun 2006 17:32:26 -0400 Subject: [BioPython] Distance Matrix Parsers Message-ID: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> Hi all, Are there any modules in BioPython to parse distance matrices? My poking around the BioPython modules and Google searching does not turn up any signs indicating there are distance matrix parsers, currently. Two particularly useful parsers would be a parser for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP (http://evolution.genetics.washington.edu/phylip.html), and a parser for the MEGA (http://www.megasoftware.net/mega.html) distance matrix format. If not, would there be any interest in creating parsers for these matrices, other than my own? I think parsers for distance matrices could be very useful to the community. Chris From mcolosimo at mitre.org Fri Jun 9 12:16:02 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Fri, 9 Jun 2006 08:16:02 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> Message-ID: <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> Hi Chris, I don't think there is a parser for those. I have in the past thought about writing them up. I was looking over the structure of BioPython to see where it would best fit [I'll save my rant on this for another time, maybe later today]. In the mean time, the folks at BioPerl have Bio-Phylo CPAN module , which looks nice, but it does NOT have what you are looking for. However, I am planning on following that. Marc On Jun 8, 2006, at 5:32 PM, Chris Lasher wrote: > Hi all, > Are there any modules in BioPython to parse distance matrices? My > poking around the BioPython modules and Google searching does not turn > up any signs indicating there are distance matrix parsers, currently. > Two particularly useful parsers would be a parser for the output of > DNADIST/PROTDIST/RESTDIST from PHYLIP > (http://evolution.genetics.washington.edu/phylip.html), and a parser > for the MEGA (http://www.megasoftware.net/mega.html) distance matrix > format. If not, would there be any interest in creating parsers for > these matrices, other than my own? I think parsers for distance > matrices could be very useful to the community. > > Chris > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From chris.lasher at gmail.com Fri Jun 9 15:59:56 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 9 Jun 2006 11:59:56 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> Message-ID: <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> Hi Marc, Thanks for the reply. I had not seen the Bio::Phylo package before. Thanks for pointing that out. That seems to have be a really useful library, though it's not exactly what I was thinking about when I originally posted. I was thinking more along the lines of the Bio::Matrix modules (http://bio.perl.org/wiki/Special:Search?search=matrix&go=Go). I don't think writing parsers for these formats will be that difficult. I am unsure, however, about what type of data structure the matrix should be. The simplest solution is a nested list. Perhaps this is the proper solution, as the user can then convert this over to a NumPy multi-dimensional array, say, or some matrix object. I dunno. Thoughts, comments, suggestions? Chris On 6/9/06, Marc Colosimo wrote: > Hi Chris, > > I don't think there is a parser for those. I have in the past thought > about writing them up. I was looking over the structure of BioPython > to see where it would best fit [I'll save my rant on this for another > time, maybe later today]. In the mean time, the folks at BioPerl have > Bio-Phylo CPAN module , > which looks nice, but it does NOT have what you are looking for. > However, I am planning on following that. > > Marc > > On Jun 8, 2006, at 5:32 PM, Chris Lasher wrote: > > > Hi all, > > Are there any modules in BioPython to parse distance matrices? My > > poking around the BioPython modules and Google searching does not turn > > up any signs indicating there are distance matrix parsers, currently. > > Two particularly useful parsers would be a parser for the output of > > DNADIST/PROTDIST/RESTDIST from PHYLIP > > (http://evolution.genetics.washington.edu/phylip.html), and a parser > > for the MEGA (http://www.megasoftware.net/mega.html) distance matrix > > format. If not, would there be any interest in creating parsers for > > these matrices, other than my own? I think parsers for distance > > matrices could be very useful to the community. > > > > Chris > > _______________________________________________ > > BioPython mailing list - BioPython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > From mcolosimo at mitre.org Fri Jun 9 18:41:29 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Fri, 9 Jun 2006 14:41:29 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> Message-ID: <8AC5BAA2-BA47-4772-88C7-DF4B2061A8E2@mitre.org> Chris, I likewise didn't know about the Bio::Matrix::PhylipDist module. Personally, I would opt for a Matrix Object (since this is Python a OO language) and store it internally as a nested list. That way you have the best of both worlds. The next question is the object hierarchy. Here I would opt for a top level Matrix class (or module) and then subclass that under Phylo. So, something like this: Bio.Matrix Bio.Phylo.Matrix and maybe things like the following (which isn't used/followed much here in BioPython) Bio.Phylo.IO Bio.Phylo.Parsers.PhylipDist Bio.Phylo.Parsers.Newick Bio.Phylo.Parsers.Nexus And/or have Bio.Phylo.Matrix.IO that uses the PhylipDist parser. The next big question is what should Bio.Phylo.IO return? For inspiration, we might want to look at Mesquite . Marc On Jun 9, 2006, at 11:59 AM, Chris Lasher wrote: > Hi Marc, > > Thanks for the reply. I had not seen the Bio::Phylo package before. > Thanks for pointing that out. That seems to have be a really useful > library, though it's not exactly what I was thinking about when I > originally posted. I was thinking more along the lines of the > Bio::Matrix modules > (http://bio.perl.org/wiki/Special:Search?search=matrix&go=Go). > > I don't think writing parsers for these formats will be that > difficult. I am unsure, however, about what type of data structure the > matrix should be. The simplest solution is a nested list. Perhaps this > is the proper solution, as the user can then convert this over to a > NumPy multi-dimensional array, say, or some matrix object. I dunno. > Thoughts, comments, suggestions? > > Chris > > On 6/9/06, Marc Colosimo wrote: >> Hi Chris, >> >> I don't think there is a parser for those. I have in the past thought >> about writing them up. I was looking over the structure of BioPython >> to see where it would best fit [I'll save my rant on this for another >> time, maybe later today]. In the mean time, the folks at BioPerl have >> Bio-Phylo CPAN module , >> which looks nice, but it does NOT have what you are looking for. >> However, I am planning on following that. >> >> Marc >> >> On Jun 8, 2006, at 5:32 PM, Chris Lasher wrote: >> >>> Hi all, >>> Are there any modules in BioPython to parse distance matrices? My >>> poking around the BioPython modules and Google searching does not >>> turn >>> up any signs indicating there are distance matrix parsers, >>> currently. >>> Two particularly useful parsers would be a parser for the output of >>> DNADIST/PROTDIST/RESTDIST from PHYLIP >>> (http://evolution.genetics.washington.edu/phylip.html), and a parser >>> for the MEGA (http://www.megasoftware.net/mega.html) distance matrix >>> format. If not, would there be any interest in creating parsers for >>> these matrices, other than my own? I think parsers for distance >>> matrices could be very useful to the community. >>> >>> Chris >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> >> > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From chris.lasher at gmail.com Fri Jun 9 21:13:32 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Fri, 9 Jun 2006 17:13:32 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <8AC5BAA2-BA47-4772-88C7-DF4B2061A8E2@mitre.org> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> <8AC5BAA2-BA47-4772-88C7-DF4B2061A8E2@mitre.org> Message-ID: <128a885f0606091413o23088caesf4934a81f0cc0489@mail.gmail.com> > I likewise didn't know about the Bio::Matrix::PhylipDist module. > Personally, I would opt for a Matrix Object (since this is Python a > OO language) and store it internally as a nested list. That way you > have the best of both worlds. The next question is the object > hierarchy. Here I would opt for a top level Matrix class (or module) > and then subclass that under Phylo. So, something like this: > > Bio.Matrix > Bio.Phylo.Matrix So is this more appropriate than Bio.Matrix.Phylo? A phylogenetic matrix is a type of matrix, so that hierarchy is immediately appealing, however, a phylogenetic matrix is not of much use in and of itself, so I can see the argument that it should be placed in a phylogeny package (which we have yet to write but as mentioned earlier, could be very useful). > and maybe things like the following (which isn't used/followed much > here in BioPython) > > Bio.Phylo.IO > Bio.Phylo.Parsers.PhylipDist > Bio.Phylo.Parsers.Newick > Bio.Phylo.Parsers.Nexus > > And/or have > Bio.Phylo.Matrix.IO that uses the PhylipDist parser. This is very very good, in my opinion. Thanks for doing the heavy-lifting of the brainwork on this! =-) > The next big question is what should Bio.Phylo.IO return? For > inspiration, we might want to look at Mesquite mesquiteproject.org/mesquite/mesquite.html>. I must give a better look at this site before commenting, but once again, thanks for bringing this to my awareness! What a helpful past couple of emails. I will be out for the weekend but will think more about this. As a sidenote, should this discussion be moved to biopython-dev or is it fine here? Thanks again Marc, Chris From biopython at maubp.freeserve.co.uk Sat Jun 10 10:10:02 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jun 2006 11:10:02 +0100 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> Message-ID: <448A9A7A.6050501@maubp.freeserve.co.uk> Chris Lasher wrote: > Hi all, Are there any modules in BioPython to parse distance > matrices? My poking around the BioPython modules and Google searching > does not turn up any signs indicating there are distance matrix > parsers, currently. Two particularly useful parsers would be a parser > for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP > (http://evolution.genetics.washington.edu/phylip.html), I've done a very small amount of work with neighbour joining trees, using PHYLIP format distance matrices. The closest I could find to a file format definition was this page: http://evolution.genetics.washington.edu/phylip/doc/distance.html Points to be aware of: In my experience, most software tools usually write the distances as a full symmetric matrix. However, the "standard" explicitly discusses lower triangular form (missing out the diagonal distance zero entries) which has the significant advantage of using about half the disk space. This is significant once you get into thousands of taxa. So, make sure any parser can cope with both full symmetric, and lower triangular forms - ideally without the user having to care. This also raises the point about how to store the matrix in memory. Does Numeric/NumPy have an efficient way of storing symmetric matrices? This is less flexible than the suggested list of lists, but for large datasets would need much less memory. Second point - the "official" PHYLIP distance matrix file format truncates the taxa names at 10 characters. Some tools (e.g. clustalw) ignore this limitation and will use as many as needed for the full name. I personally find this much nicer - after all most gene identifiers (e.g. GI numbers) are eight characters to start with, and if you are dealing with multiple features in each gene 10 characters is tough going. So, I would make sure you test the parser on this format variant (with names longer than 10 characters). I can supply some examples if you like. For writing matrices to file, the issue of following the strict 10 character taxa limit might best be handled as an option (default to max 10, with a warning if any names are truncated, and an error if truncation renders names non-unique?). Likewise an option to save matrices as either fully symmetric or lower triangular. I would lean towards using fully symmetric as the default as it seems to be more common. > and a parser for the MEGA (http://www.megasoftware.net/mega.html) > distance matrix format. If not, would there be any interest in > creating parsers for these matrices, other than my own? I think > parsers for distance matrices could be very useful to the community. I suspect that for serious tree building pure python will not be competitive with existing C/C++ code on speed - but non-the-less could be useful. Peter From idoerg at burnham.org Sat Jun 10 15:08:43 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat, 10 Jun 2006 08:08:43 -0700 Subject: [BioPython] Distance Matrix Parsers References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: <1F97379A556D0946AAEFE3F63FD6F5744D468D@MAIL.burnham.org> Hi, Bio.SubsMat has a parser for substitution matrices, lower triangular and square. Feel free to recycle code. Best, Iddo -- Iddo Friedberg, PhD Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA T: +1 858 646 3100 x3516 http://iddo-friedberg.org http://BioFunctionPrediction.org -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Peter Sent: Sat 6/10/2006 3:10 AM To: BioPython Mailing List Subject: Re: [BioPython] Distance Matrix Parsers Chris Lasher wrote: > Hi all, Are there any modules in BioPython to parse distance > matrices? My poking around the BioPython modules and Google searching > does not turn up any signs indicating there are distance matrix > parsers, currently. Two particularly useful parsers would be a parser > for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP > (http://evolution.genetics.washington.edu/phylip.html), I've done a very small amount of work with neighbour joining trees, using PHYLIP format distance matrices. The closest I could find to a file format definition was this page: http://evolution.genetics.washington.edu/phylip/doc/distance.html Points to be aware of: In my experience, most software tools usually write the distances as a full symmetric matrix. However, the "standard" explicitly discusses lower triangular form (missing out the diagonal distance zero entries) which has the significant advantage of using about half the disk space. This is significant once you get into thousands of taxa. So, make sure any parser can cope with both full symmetric, and lower triangular forms - ideally without the user having to care. This also raises the point about how to store the matrix in memory. Does Numeric/NumPy have an efficient way of storing symmetric matrices? This is less flexible than the suggested list of lists, but for large datasets would need much less memory. Second point - the "official" PHYLIP distance matrix file format truncates the taxa names at 10 characters. Some tools (e.g. clustalw) ignore this limitation and will use as many as needed for the full name. I personally find this much nicer - after all most gene identifiers (e.g. GI numbers) are eight characters to start with, and if you are dealing with multiple features in each gene 10 characters is tough going. So, I would make sure you test the parser on this format variant (with names longer than 10 characters). I can supply some examples if you like. For writing matrices to file, the issue of following the strict 10 character taxa limit might best be handled as an option (default to max 10, with a warning if any names are truncated, and an error if truncation renders names non-unique?). Likewise an option to save matrices as either fully symmetric or lower triangular. I would lean towards using fully symmetric as the default as it seems to be more common. > and a parser for the MEGA (http://www.megasoftware.net/mega.html) > distance matrix format. If not, would there be any interest in > creating parsers for these matrices, other than my own? I think > parsers for distance matrices could be very useful to the community. I suspect that for serious tree building pure python will not be competitive with existing C/C++ code on speed - but non-the-less could be useful. Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From idoerg at burnham.org Sat Jun 10 15:08:43 2006 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat, 10 Jun 2006 08:08:43 -0700 Subject: [BioPython] Distance Matrix Parsers References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: <1F97379A556D0946AAEFE3F63FD6F5744D468D@MAIL.burnham.org> Hi, Bio.SubsMat has a parser for substitution matrices, lower triangular and square. Feel free to recycle code. Best, Iddo -- Iddo Friedberg, PhD Burnham Institute for Medical Research 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA T: +1 858 646 3100 x3516 http://iddo-friedberg.org http://BioFunctionPrediction.org -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Peter Sent: Sat 6/10/2006 3:10 AM To: BioPython Mailing List Subject: Re: [BioPython] Distance Matrix Parsers Chris Lasher wrote: > Hi all, Are there any modules in BioPython to parse distance > matrices? My poking around the BioPython modules and Google searching > does not turn up any signs indicating there are distance matrix > parsers, currently. Two particularly useful parsers would be a parser > for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP > (http://evolution.genetics.washington.edu/phylip.html), I've done a very small amount of work with neighbour joining trees, using PHYLIP format distance matrices. The closest I could find to a file format definition was this page: http://evolution.genetics.washington.edu/phylip/doc/distance.html Points to be aware of: In my experience, most software tools usually write the distances as a full symmetric matrix. However, the "standard" explicitly discusses lower triangular form (missing out the diagonal distance zero entries) which has the significant advantage of using about half the disk space. This is significant once you get into thousands of taxa. So, make sure any parser can cope with both full symmetric, and lower triangular forms - ideally without the user having to care. This also raises the point about how to store the matrix in memory. Does Numeric/NumPy have an efficient way of storing symmetric matrices? This is less flexible than the suggested list of lists, but for large datasets would need much less memory. Second point - the "official" PHYLIP distance matrix file format truncates the taxa names at 10 characters. Some tools (e.g. clustalw) ignore this limitation and will use as many as needed for the full name. I personally find this much nicer - after all most gene identifiers (e.g. GI numbers) are eight characters to start with, and if you are dealing with multiple features in each gene 10 characters is tough going. So, I would make sure you test the parser on this format variant (with names longer than 10 characters). I can supply some examples if you like. For writing matrices to file, the issue of following the strict 10 character taxa limit might best be handled as an option (default to max 10, with a warning if any names are truncated, and an error if truncation renders names non-unique?). Likewise an option to save matrices as either fully symmetric or lower triangular. I would lean towards using fully symmetric as the default as it seems to be more common. > and a parser for the MEGA (http://www.megasoftware.net/mega.html) > distance matrix format. If not, would there be any interest in > creating parsers for these matrices, other than my own? I think > parsers for distance matrices could be very useful to the community. I suspect that for serious tree building pure python will not be competitive with existing C/C++ code on speed - but non-the-less could be useful. Peter _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4656 bytes Desc: not available URL: From mcolosimo at mitre.org Mon Jun 12 12:38:18 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Mon, 12 Jun 2006 08:38:18 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606091413o23088caesf4934a81f0cc0489@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <9BE2CFC6-BACE-4D98-86A0-99E9CFBA228A@mitre.org> <128a885f0606090859x608e733ela89fdb879e531dc8@mail.gmail.com> <8AC5BAA2-BA47-4772-88C7-DF4B2061A8E2@mitre.org> <128a885f0606091413o23088caesf4934a81f0cc0489@mail.gmail.com> Message-ID: <65DF4A7E-B365-4E61-93D4-156A36F6ED54@mitre.org> [cross-posting to biopython-dev] Chris, Oops, didn't notice this was on the general biopython mailing list. I think many of the developers also subscribe to this list, but just in case I'm cross posting this. Iddo pointed out the Bio.SubsMat, which I didn't know what that module did. One problem with names like that, but the API Docs are helpful only when you look at them (Kuddos for those who add documentation). Given Bio.SubsMat and the BioPerl Module, I would strongly consider combining the Bio.SubsMat and the PhylipDist into a new Bio.Matrix module. From a Phylo module, a function/class can always call the Bio.Matrix classes. Marc On Jun 9, 2006, at 5:13 PM, Chris Lasher wrote: >> I likewise didn't know about the Bio::Matrix::PhylipDist module. >> Personally, I would opt for a Matrix Object (since this is Python a >> OO language) and store it internally as a nested list. That way you >> have the best of both worlds. The next question is the object >> hierarchy. Here I would opt for a top level Matrix class (or module) >> and then subclass that under Phylo. So, something like this: >> >> Bio.Matrix >> Bio.Phylo.Matrix > > So is this more appropriate than Bio.Matrix.Phylo? A phylogenetic > matrix is a type of matrix, so that hierarchy is immediately > appealing, however, a phylogenetic matrix is not of much use in and of > itself, so I can see the argument that it should be placed in a > phylogeny package (which we have yet to write but as mentioned > earlier, could be very useful). > >> and maybe things like the following (which isn't used/followed much >> here in BioPython) >> >> Bio.Phylo.IO >> Bio.Phylo.Parsers.PhylipDist >> Bio.Phylo.Parsers.Newick >> Bio.Phylo.Parsers.Nexus >> >> And/or have >> Bio.Phylo.Matrix.IO that uses the PhylipDist parser. > > This is very very good, in my opinion. Thanks for doing the > heavy-lifting of the brainwork on this! =-) > >> The next big question is what should Bio.Phylo.IO return? For >> inspiration, we might want to look at Mesquite > mesquiteproject.org/mesquite/mesquite.html>. > > I must give a better look at this site before commenting, but once > again, thanks for bringing this to my awareness! What a helpful past > couple of emails. I will be out for the weekend but will think more > about this. > > As a sidenote, should this discussion be moved to biopython-dev or is > it fine here? > > Thanks again Marc, > Chris > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From mcolosimo at mitre.org Mon Jun 12 13:18:41 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Mon, 12 Jun 2006 09:18:41 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <448A9A7A.6050501@maubp.freeserve.co.uk> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: [cross post] On Jun 10, 2006, at 6:10 AM, Peter wrote: > Chris Lasher wrote: >> Hi all, Are there any modules in BioPython to parse distance >> matrices? My poking around the BioPython modules and Google searching >> does not turn up any signs indicating there are distance matrix >> parsers, currently. Two particularly useful parsers would be a parser >> for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP >> (http://evolution.genetics.washington.edu/phylip.html), > > I've done a very small amount of work with neighbour joining trees, > using PHYLIP format distance matrices. The closest I could find to a > file format definition was this page: > > http://evolution.genetics.washington.edu/phylip/doc/distance.html > > Points to be aware of: > > In my experience, most software tools usually write the distances as a > full symmetric matrix. However, the "standard" explicitly discusses > lower triangular form (missing out the diagonal distance zero entries) > which has the significant advantage of using about half the disk > space. > This is significant once you get into thousands of taxa. This is still small potatoes compared to the input needed to generate the distance matrixs (especially with DNA/RNA sequences of any decently sized gene). > > So, make sure any parser can cope with both full symmetric, and lower > triangular forms - ideally without the user having to care. Phylip does ask you which to either read or write; this is a pain at times. So, having a parser figure this out would be nice. However, the user should know about the choices. > > This also raises the point about how to store the matrix in memory. > Does Numeric/NumPy have an efficient way of storing symmetric > matrices? > This is less flexible than the suggested list of lists, but for > large > datasets would need much less memory. I believe that SciPy (Numeric/NumPy/etc..) is more efficient at storing these things. But you lose that when you want to do pythonish things to it (like write it back out). > > Second point - the "official" PHYLIP distance matrix file format > truncates the taxa names at 10 characters. Some tools (e.g. clustalw) > ignore this limitation and will use as many as needed for the full > name. ClustalW does the CORRECT thing, it truncates the name to 10 characters for Phylip output (alignments). And it does the CORRECT thing for its distance matrix file. In Clustalw's trees.c file void distance_matrix_output(FILE *ofile) fprintf(ofile,"\n%-*s ",max_names,names[i]); /* left justify to the maximum length of names in current alignment file and use a space as a sep */ spaces in names are bad in this case, but phylip is okay with them, since the first 10 characters are the taxon name. > I personally find this much nicer - after all most gene identifiers > (e.g. GI numbers) are eight characters to start with, and if you are > dealing with multiple features in each gene 10 characters is tough > going. > > So, I would make sure you test the parser on this format variant (with > names longer than 10 characters). I can supply some examples if > you like. By definition this isn't a variant of Phylip, but another format. So, one would need two parsers: PhylipDist and Dist (or ClustalDist). > > For writing matrices to file, the issue of following the strict 10 > character taxa limit might best be handled as an option (default to > max > 10, with a warning if any names are truncated, and an error if > truncation renders names non-unique?). DON'T give an option of 10 or more. That is NOT the definition of the Phylip file Matrix structure, so why give the option? Make another class that outputs the whole name (ClustalDist). I am pretty sure that Phylip doesn't care about non-unique names so why error out? However, the class should have a means for the user to ask this question. > > Likewise an option to save matrices as either fully symmetric or lower > triangular. I would lean towards using fully symmetric as the default > as it seems to be more common. Phylip's default seems to be a "Square" distance matrix, i.e. fully symmetric. Keep this in mind when naming or documentation. > >> and a parser for the MEGA (http://www.megasoftware.net/mega.html) >> distance matrix format. If not, would there be any interest in >> creating parsers for these matrices, other than my own? I think >> parsers for distance matrices could be very useful to the community. > > I suspect that for serious tree building pure python will not be > competitive with existing C/C++ code on speed - but non-the-less could > be useful. > Well, we do have things like SciPy and PyClustal, which make things more even. Marc From mcolosimo at mitre.org Mon Jun 12 13:18:41 2006 From: mcolosimo at mitre.org (Marc Colosimo) Date: Mon, 12 Jun 2006 09:18:41 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <448A9A7A.6050501@maubp.freeserve.co.uk> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: [cross post] On Jun 10, 2006, at 6:10 AM, Peter wrote: > Chris Lasher wrote: >> Hi all, Are there any modules in BioPython to parse distance >> matrices? My poking around the BioPython modules and Google searching >> does not turn up any signs indicating there are distance matrix >> parsers, currently. Two particularly useful parsers would be a parser >> for the output of DNADIST/PROTDIST/RESTDIST from PHYLIP >> (http://evolution.genetics.washington.edu/phylip.html), > > I've done a very small amount of work with neighbour joining trees, > using PHYLIP format distance matrices. The closest I could find to a > file format definition was this page: > > http://evolution.genetics.washington.edu/phylip/doc/distance.html > > Points to be aware of: > > In my experience, most software tools usually write the distances as a > full symmetric matrix. However, the "standard" explicitly discusses > lower triangular form (missing out the diagonal distance zero entries) > which has the significant advantage of using about half the disk > space. > This is significant once you get into thousands of taxa. This is still small potatoes compared to the input needed to generate the distance matrixs (especially with DNA/RNA sequences of any decently sized gene). > > So, make sure any parser can cope with both full symmetric, and lower > triangular forms - ideally without the user having to care. Phylip does ask you which to either read or write; this is a pain at times. So, having a parser figure this out would be nice. However, the user should know about the choices. > > This also raises the point about how to store the matrix in memory. > Does Numeric/NumPy have an efficient way of storing symmetric > matrices? > This is less flexible than the suggested list of lists, but for > large > datasets would need much less memory. I believe that SciPy (Numeric/NumPy/etc..) is more efficient at storing these things. But you lose that when you want to do pythonish things to it (like write it back out). > > Second point - the "official" PHYLIP distance matrix file format > truncates the taxa names at 10 characters. Some tools (e.g. clustalw) > ignore this limitation and will use as many as needed for the full > name. ClustalW does the CORRECT thing, it truncates the name to 10 characters for Phylip output (alignments). And it does the CORRECT thing for its distance matrix file. In Clustalw's trees.c file void distance_matrix_output(FILE *ofile) fprintf(ofile,"\n%-*s ",max_names,names[i]); /* left justify to the maximum length of names in current alignment file and use a space as a sep */ spaces in names are bad in this case, but phylip is okay with them, since the first 10 characters are the taxon name. > I personally find this much nicer - after all most gene identifiers > (e.g. GI numbers) are eight characters to start with, and if you are > dealing with multiple features in each gene 10 characters is tough > going. > > So, I would make sure you test the parser on this format variant (with > names longer than 10 characters). I can supply some examples if > you like. By definition this isn't a variant of Phylip, but another format. So, one would need two parsers: PhylipDist and Dist (or ClustalDist). > > For writing matrices to file, the issue of following the strict 10 > character taxa limit might best be handled as an option (default to > max > 10, with a warning if any names are truncated, and an error if > truncation renders names non-unique?). DON'T give an option of 10 or more. That is NOT the definition of the Phylip file Matrix structure, so why give the option? Make another class that outputs the whole name (ClustalDist). I am pretty sure that Phylip doesn't care about non-unique names so why error out? However, the class should have a means for the user to ask this question. > > Likewise an option to save matrices as either fully symmetric or lower > triangular. I would lean towards using fully symmetric as the default > as it seems to be more common. Phylip's default seems to be a "Square" distance matrix, i.e. fully symmetric. Keep this in mind when naming or documentation. > >> and a parser for the MEGA (http://www.megasoftware.net/mega.html) >> distance matrix format. If not, would there be any interest in >> creating parsers for these matrices, other than my own? I think >> parsers for distance matrices could be very useful to the community. > > I suspect that for serious tree building pure python will not be > competitive with existing C/C++ code on speed - but non-the-less could > be useful. > Well, we do have things like SciPy and PyClustal, which make things more even. Marc From asmund.skjaveland at usit.uio.no Mon Jun 12 15:45:26 2006 From: asmund.skjaveland at usit.uio.no (=?ISO-8859-1?Q?=C5smund_Skj=E6veland?=) Date: Mon, 12 Jun 2006 17:45:26 +0200 Subject: [BioPython] Generating Nexus file from Genbank file Message-ID: <448D8C16.6050204@fys.uio.no> I have a file of Genbank records, and want to extract some of them and save to a Nexus file. As far as I can tell from the API, this should work: #!/site/compython/Linux/bin/python import Bio, sys, time from Bio.GenBank import Iterator from Bio.Nexus.Nexus import Nexus gbfile='results/sequences-txid34828.genbank' fp = Bio.GenBank.FeatureParser() gb = open(gbfile, 'r') it = Bio.GenBank.Iterator(gb, fp) nex = Nexus() nr = 0; rec = it.next() while rec: # A string to identify the sequence with nexusname=rec.features[0].qualifiers['db_xref'][0] + '--' + rec.name nex.add_sequence(nexusname, rec.seq) rec = it.next() print "\n\n%d records, %d gene names" % (nr, len(genenames)) nex.write_nexus_data('results/genegrab.nex', mrbayes=True) But it doesn't. When I run it: Traceback (most recent call last): File "py_nexustest.py", line 39, in ? nex.add_sequence(nexusname, rec.seq) File "/site/compython/Linux/lib/python2.4/site-packages/Bio/Nexus/Nexus.py", line 1412, in add_sequence self.matrix[name]=Seq(sequence,self.alphabet) AttributeError: 'Nexus' object has no attribute 'alphabet' What am I doing wrong? I don't really know the Nexus format, I just want to send certain sequences to MrBayes. -- ?smund Skj?veland { Scientific Computing Group, UiO; } From rohini.damle at gmail.com Tue Jun 13 19:09:21 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Tue, 13 Jun 2006 12:09:21 -0700 Subject: [BioPython] (no subject) Message-ID: Hi, I am new to bipyton trying to use ncbistandalone parser to parse my blast out put which is in txt format. the parser works well for older blast uptputs but breaks down for newer blast outputs. Can someone suggest me a way to overcome this blast parser's problem? Thanks From winter at biotec.tu-dresden.de Wed Jun 14 08:00:20 2006 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 14 Jun 2006 10:00:20 +0200 Subject: [BioPython] (no subject) In-Reply-To: References: Message-ID: <448FC214.20805@biotec.tu-dresden.de> Hi Rohini, can you provide a minimal example of your python code along with two blast reports (working/not working)? Cheers, Christof Rohini Damle wrote: > Hi, > I am new to bipyton trying to use ncbistandalone parser to parse my blast > out put which is in txt format. > the parser works well for older blast uptputs but breaks down for newer > blast outputs. Can someone suggest me a way to overcome this blast parser's > problem? > Thanks > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From biopython at maubp.freeserve.co.uk Wed Jun 14 09:09:48 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jun 2006 10:09:48 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: References: Message-ID: <448FD25C.20101@maubp.freeserve.co.uk> Rohini Damle wrote: > Hi, > I am new to bipyton trying to use ncbistandalone parser to parse my blast > out put which is in txt format. > the parser works well for older blast uptputs but breaks down for newer > blast outputs. The NCBI standalone blast and web blast plain text output keeps changing slightly, and as a result, the parser isn't always up to date. > Can someone suggest me a way to overcome this blast parser's > problem? We recommend you use the XML output instead (this is possible with both online blast and the standalone tools). For the stand alone tools, repeat your searches with the command line option -m 7 to get XML output. If you are using the Bio.NCBIStandalone.blastall() command, use argument align_view to set this. You still use NCBIStandalone.Iterator (if you have multiple queries) but now use NCBIXML.BlastParser instead of NCBIStandalone.BlastParser e.g. http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view Peter From rohini.damle at gmail.com Wed Jun 14 18:22:59 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Wed, 14 Jun 2006 11:22:59 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <448FD25C.20101@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> Message-ID: Thank you very much for your help. I have 55-56 proteins & I am using Blast to find out short, nearly exact matches. The xml parser works fine for first record but even if I used the iterator, I CAN NOT ITERATE through the records, I have used the same code as u have given, what might be wrong? Rohini. On 6/14/06, Peter wrote: > > Rohini Damle wrote: > > Hi, > > I am new to bipyton trying to use ncbistandalone parser to parse my > blast > > out put which is in txt format. > > the parser works well for older blast uptputs but breaks down for newer > > blast outputs. > > The NCBI standalone blast and web blast plain text output keeps changing > slightly, and as a result, the parser isn't always up to date. > > > Can someone suggest me a way to overcome this blast parser's > > problem? > > We recommend you use the XML output instead (this is possible with both > online blast and the standalone tools). > > For the stand alone tools, repeat your searches with the command line > option -m 7 to get XML output. > > If you are using the Bio.NCBIStandalone.blastall() command, use argument > align_view to set this. > > You still use NCBIStandalone.Iterator (if you have multiple queries) but > now use NCBIXML.BlastParser instead of NCBIStandalone.BlastParser > > e.g. > http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view > > Peter > > From manickam.muthuraman at wur.nl Wed Jun 14 20:22:56 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Wed, 14 Jun 2006 22:22:56 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> I am new to python I am getting error in parsing blastoutput more over the same problem was been addressed by Michiel De Hoon but i could not clear...here is the error what i am getting. first i got error when i typed b_record=b_parser.parse(blast_out) as michiel suggested i changed to b_record=b_parser.parse(blast_out) Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError raise exception SAXParseException: my_blast.out:1:4: not well-formed (invalid token) blast_out=open('my_blast.out','r') from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML b_parser=NCBIXML.BlastParser() b_iterator1=NCBIStandalone.Iterator(blast_out,b_parser) for alignment in b_iterator1.alignments: for hsp in alignment.hsps: print 'seq:',alignment.title Traceback (most recent call last): File "", line 1, in ? AttributeError: Iterator instance has no attribute 'alignments' how do i print the title.alignment and so on.....from the blast output file thanks in advance -- Manickam(melaimanik) From biopython at maubp.freeserve.co.uk Wed Jun 14 21:54:53 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jun 2006 22:54:53 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: References: <448FD25C.20101@maubp.freeserve.co.uk> Message-ID: <449085AD.7010801@maubp.freeserve.co.uk> Rohini Damle wrote: > Thank you very much for your help. > I have 55-56 proteins & I am using Blast to find out short, nearly exact > matches. The xml parser works fine for first record but even if I used the > iterator, I CAN NOT ITERATE through the records, I have used the same code > as u have given, what might be wrong? > Rohini. If you you send us a short be of example code, and the error message that would help. Also, what version of BioPython are you using, and do you have Windows or Linux or MacOS... One guess is that you will need to update the NCBIStandalone.py file to include a recent fix for iterating XML files. Assuming you are using BioPython 1.41 on Windows, the click on this link and pick "download" near the top of the page to get the latest verion: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython Save it here: c:\python24\lib\site-packages\Bio\Blast\NCBIStandalone.py (Make a copy of the old file first, just in case) Peter From mdehoon at c2b2.columbia.edu Wed Jun 14 21:55:17 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Wed, 14 Jun 2006 17:55:17 -0400 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> Message-ID: <449085C5.4020101@c2b2.columbia.edu> Muthuraman, Manickam wrote: > b_parser=NCBIXML.BlastParser() > b_iterator1=NCBIStandalone.Iterator(blast_out,b_parser) > for alignment in b_iterator1.alignments: > for hsp in alignment.hsps: > print 'seq:',alignment.title > > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: Iterator instance has no attribute 'alignments' > Use: b_record = b_iterator1.next() for alignment in b_record.alignments: ... Just like the example in the tutorial. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Wed Jun 14 21:48:20 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jun 2006 22:48:20 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> Message-ID: <44908424.2070407@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > I am new to python > > I am getting error in parsing blastoutput more over the same problem > was been addressed by Michiel De Hoon but i could not clear... > > blast_out=open('my_blast.out','r') > from Bio.Blast import NCBIStandalone > from Bio.Blast import NCBIXML > b_parser=NCBIXML.BlastParser() > b_iterator1=NCBIStandalone.Iterator(blast_out,b_parser) > for alignment in b_iterator1.alignments: > for hsp in alignment.hsps: > print 'seq:',alignment.title > Your example code is wrong. The iterator object will return blast record objects (which have an alignments property). Try something like this: blast_out=open('my_blast.out','r') from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML b_parser=NCBIXML.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) for b_record in b_iterator: for alignment in b_record.alignments: for hsp in alignment.hsps: print 'seq:',alignment.title Or for a full and tested example, try this : http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view Peter From rohini.damle at gmail.com Wed Jun 14 18:21:18 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Wed, 14 Jun 2006 11:21:18 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <448FD25C.20101@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> Message-ID: Thank you very much for your help. I have 55-56 proteins & I am using Blast to find out short, nearly exact matches. The xml parser works fine for first record but even if I used the iterator, I On 6/14/06, Peter wrote: > > Rohini Damle wrote: > > Hi, > > I am new to bipyton trying to use ncbistandalone parser to parse my > blast > > out put which is in txt format. > > the parser works well for older blast uptputs but breaks down for newer > > blast outputs. > > The NCBI standalone blast and web blast plain text output keeps changing > slightly, and as a result, the parser isn't always up to date. > > > Can someone suggest me a way to overcome this blast parser's > > problem? > > We recommend you use the XML output instead (this is possible with both > online blast and the standalone tools). > > For the stand alone tools, repeat your searches with the command line > option -m 7 to get XML output. > > If you are using the Bio.NCBIStandalone.blastall() command, use argument > align_view to set this. > > You still use NCBIStandalone.Iterator (if you have multiple queries) but > now use NCBIXML.BlastParser instead of NCBIStandalone.BlastParser > > e.g. > http://bugzilla.open-bio.org/attachment.cgi?id=293&action=view > > Peter > > From manickam.muthuraman at wur.nl Thu Jun 15 11:47:34 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Thu, 15 Jun 2006 13:47:34 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> Still i am getting the same error or error. I tried as Peter suggested but it fails. I have attached the error and the code [manickam at bioinfo python]$ cat blas.py from Bio import Fasta file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r') f_iterator=Fasta.Iterator(file_for_blast) f_record=f_iterator.next() from Bio.Blast import NCBIWWW result_handle=NCBIWWW.qblast('blastp','nr',f_record) save_file=open('/home/manickam/my_blast.out','w') blast_results=result_handle.read() save_file.write(blast_results) save_file.close() blast_out=open('/home/manickam/my_blast.out','r') from Bio.Blast import NCBIXML from Bio.Blast import NCBIStandalone b_parser=NCBIXML.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) for b_record in b_iterator: print "inside (3)outer loop" for alignment in b_record.alignments: print "inside 2 loop" for hsp in alignment.hsps: print "inside 1 loop" print 'seq:',alignment.title blast_out.close() [manickam at bioinfo python]$ [manickam at bioinfo python]$ python blas.py /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py:1064: UserWarning: qblast works only with blastn and blastp for now. warnings.warn("qblast works only with blastn and blastp for now.") Traceback (most recent call last): File "blas.py", line 16, in ? for b_record in b_iterator: File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 1385, in next return self._parser.parse(File.StringHandle(data)) File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:4: not well-formed (invalid token) [manickam at bioinfo python]$ From manickam.muthuraman at wur.nl Thu Jun 15 11:51:36 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Thu, 15 Jun 2006 13:51:36 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> Dear Michiel I tried your suggestion as well but i am getting error. I could even understand where i am making mistake. [manickam at bioinfo python]$ cat blas.py from Bio import Fasta file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r') f_iterator=Fasta.Iterator(file_for_blast) f_record=f_iterator.next() from Bio.Blast import NCBIWWW result_handle=NCBIWWW.qblast('blastp','nr',f_record) save_file=open('/home/manickam/my_blast.out','w') blast_results=result_handle.read() save_file.write(blast_results) save_file.close() blast_out=open('/home/manickam/my_blast.out','r') from Bio.Blast import NCBIXML from Bio.Blast import NCBIStandalone b_parser=NCBIXML.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) b_record = b_iterator.next() for alignment in b_record.alignments: print "inside 2 loop" for hsp in alignment.hsps: print "inside 1 loop" print 'seq:',alignment.title blast_out.close() [manickam at bioinfo python]$ python blas.py /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py:1064: UserWarning: qblast works only with blastn and blastp for now. warnings.warn("qblast works only with blastn and blastp for now.") Traceback (most recent call last): File "blas.py", line 16, in ? b_record = b_iterator.next() File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 1385, in next return self._parser.parse(File.StringHandle(data)) File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:4: not well-formed (invalid token) [manickam at bioinfo python]$ From biopython at maubp.freeserve.co.uk Thu Jun 15 12:25:06 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 13:25:06 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> Message-ID: <449151A2.1040602@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > Still i am getting the same error or error. I tried as Peter suggested but it fails. > ... I couldn't see anything clearly wrong just from reading your code. Which version of BioPython do you have? Since BioPython 1.41 NCBIWWW.qblast uses XML as the default output format, but you can force this by: result_handle=NCBIWWW.qblast('blastp','nr',f_record, format_type="XML") Try opening your output file /home/manickam/my_blast.out in a text editor to double check it really is XML - i.e. does it start If it is XML, then BioPython doesn't like it for some reason. Maybe you could email the file to me and Michiel to take a look? Peter From manickam.muthuraman at wur.nl Thu Jun 15 14:13:17 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Thu, 15 Jun 2006 16:13:17 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> Dear peter here is the code my_blast.out and the error. My need is to get all the blast hit sequences in fasta format. By parsing and i can extract accession number from it. Code from Bio import Fasta file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r') f_iterator=Fasta.Iterator(file_for_blast) f_record=f_iterator.next() from Bio.Blast import NCBIWWW result_handle=NCBIWWW.qblast('blastp','nr',f_record, format_type="XML") save_file=open('/home/manickam/my_blast.out','w') blast_results=result_handle.read() save_file.write(blast_results) save_file.close() blast_out=open('/home/manickam/my_blast.out','r') from Bio.Blast import NCBIXML from Bio.Blast import NCBIStandalone b_parser=NCBIXML.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) b_record = b_iterator.next() for alignment in b_record.alignments: print "inside 2 loop" for hsp in alignment.hsps: print "inside 1 loop" print 'seq:',alignment.title blast_out.close() Error [root at bioinfo python]# python blas.py /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py:1064: UserWarning: qblast works only with blastn and blastp for now. warnings.warn("qblast works only with blastn and blastp for now.") Traceback (most recent call last): File "blas.py", line 16, in ? b_record = b_iterator.next() File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 1410, in next return self._parser.parse(File.StringHandle(data)) File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.4/xml/sax/expatreader.py", line 211, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.4/xml/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:4: not well-formed (invalid token) [root at bioinfo python]# my_blast.out HTTP/1.1 200 OK Date: Thu, 15 Jun 2006 13:57:19 GMT Server: Nde Content-Type: application/xml Connection: close blastp BLASTP 2.2.14 [May-07-2006] Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. nr 1_13944 1BK0 331 . . . . . . . . 76 128 27 295 VPKIDVSPLFGD-DQAAKMRVAQQIDAASRDTGFFYAVNHGIN---VQRLSQKTKEFHMSITPEEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNP--NFTPDHPRIQAKTPTHEVNVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVVLIRYP-YLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEADDTGYLINCGSYMAHLTNNYYKAPIHRV--KWVNAERQSLPFFVNLGYDSVI LPVIDLSLLDGSPESAAKFR--DDLLCATHDVGFFYLVGHGVDESLMDDLLAASREFFD--LPEDQKFAVENVKSPQFRGYTRVGGELT-EGKTDWREQIDVGPERDVIDNAPGLADYWRLEGPNLWPDAV--PQLRGLVNEWNDKLSAVSLRLLRAWAHALGAPEDVFDNAFA-DKPFPQLKIVRYPGESNPEPKQGVGAHRDGGVLTL----------LMVEPGKGGLQVDYNGEWVDVPPKPGAFVVNIGEMLELATEGYLKATLHRVISPLIGDDRISIPFFFNPALDTVM +P ID+S L G + AAK R + A+ D GFFY V HG++ + L ++EF PE++ + + + R G L+ GK + P + + P + N+WPD P + ++ + +S LL+ +A ALG E+ F F D + ++RYP +P P+ + DG L+ ++ + LQV+ + D+ +++N G + T Y KA +HRV + +R S+PFF N D+V+ 3695564 1269795892 0 0 0.041 0.267 0.14 From biopython at maubp.freeserve.co.uk Thu Jun 15 15:01:42 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 16:01:42 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> Message-ID: <44917656.6090602@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > Dear peter > > here is the code my_blast.out and the error. My need is to get all the > blast hit sequences in fasta format. By parsing and i can extract > accession number from it. I made an example fasta file containing just this one sequence twice: >example1 VPKIDVSPLFGDDQAAKMRVAQQIDAASRDTGFFYAVNHGINVQRLSQKTKEFHMSITP EEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNPNFTPDHPRIQAKTPTHEV NVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVV LIRYPYLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEA DDTGYLINCGSYMAHLTNNYYKAPIHRVKWVNAERQSLPFFVNLGYDSVI >example2 VPKIDVSPLFGDDQAAKMRVAQQIDAASRDTGFFYAVNHGINVQRLSQKTKEFHMSITP EEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNPNFTPDHPRIQAKTPTHEV NVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVV LIRYPYLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEA DDTGYLINCGSYMAHLTNNYYKAPIHRVKWVNAERQSLPFFVNLGYDSVI I then edited the filenames in your example, and ran the code. It worked for me using a fresh install of BioPython 1.41 on Linux with Python 2.4.2 So the good news is your code seems fine. Maybe there is something "funny" with your fasta file? Accented characters for example - which would then be in the output XML file? Could you send me the fasta file and the XML file (in full, as attachments), off the mailing list to avoid clogging up everyone's inboxes. Thanks Peter From biopython at maubp.freeserve.co.uk Thu Jun 15 15:08:32 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 16:08:32 +0100 Subject: [BioPython] Abuse of the new Wiki Homepage Message-ID: <449177F0.1010209@maubp.freeserve.co.uk> I've noticed someone has created an account "Ceas" on the wiki and has been inserting junk/spam links. For example, look at the history of the main page: http://biopython.org/wiki/Biopython Who is in charge of the Wiki? Can we (a) block this account (short term action) (b) tighten up rules for creating new accounts? Peter From arareko at campus.iztacala.unam.mx Thu Jun 15 16:13:50 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 15 Jun 2006 11:13:50 -0500 Subject: [BioPython] Abuse of the new Wiki Homepage In-Reply-To: <449177F0.1010209@maubp.freeserve.co.uk> References: <449177F0.1010209@maubp.freeserve.co.uk> Message-ID: <4491873E.50509@campus.iztacala.unam.mx> Hi Peter, We started to have the same problem in the BioPerl wiki some months ago. The way we usually solve this is by blocking the user account and rolling back to the previous version of the affected document. We have a list of wiki administrators who are constantly (and independently) monitoring the recent changes in the site. This way we can keep track of the changes and revert damages to the content: http://bioperl.org/wiki/BioPerl:Administrators http://bioperl.org/wiki/Special:Recentchanges You can also keep track of the changes by using the RSS or Atom feeds provided by the Recentchanges page: http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom The wiki system has memory of the blocked users and IP's, you can have a look here: http://bioperl.org/wiki/Special:Ipblocklist There also exists a Blacklist, which is a complement to the main Wikimedia's one and helps detect spam content before it goes into a document: http://bioperl.org/wiki/Help:Blacklist http://meta.wikimedia.org/wiki/Spam_blacklist I don't know who's in charge of BioPython's wiki but I hope this info can be helpful to you. Regards, Mauricio. Peter wrote: > I've noticed someone has created an account "Ceas" on the wiki and has > been inserting junk/spam links. For example, look at the history of the > main page: > > http://biopython.org/wiki/Biopython > > Who is in charge of the Wiki? Can we > (a) block this account (short term action) > (b) tighten up rules for creating new accounts? > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From dag at sonsorol.org Thu Jun 15 16:31:01 2006 From: dag at sonsorol.org (Chris Dagdigian) Date: Thu, 15 Jun 2006 12:31:01 -0400 Subject: [BioPython] Abuse of the new Wiki Homepage In-Reply-To: <4491873E.50509@campus.iztacala.unam.mx> References: <449177F0.1010209@maubp.freeserve.co.uk> <4491873E.50509@campus.iztacala.unam.mx> Message-ID: I deal with a number of wiki sites, all of which are subjected to a constant stream of automated spam posters. The single best defense is volunteers who monitor the "Recent Changes" feed and take instant action to rollback the spam changes: http://biopython.org/wiki/Special:Recentchanges People can monitor that page (in web or RSS form) and rollback spam shortly after it happens. It really is the best way. Anyone can roll back changes. If you find yourself doing it often, ask to become a wiki administrator and then you'll be able to blocklist people and IP addresses as well. Behind the scenes we do other things to block spam, including regular expression tests on content, blacklists etc. but it is a constant arms race with the wiki spammers and we are always a bit behind. My $.02 -Chris On Jun 15, 2006, at 12:13 PM, Mauricio Herrera Cuadra wrote: > Hi Peter, > > We started to have the same problem in the BioPerl wiki some months > ago. The way we usually solve this is by blocking the user account > and rolling back to the previous version of the affected document. > > We have a list of wiki administrators who are constantly (and > independently) monitoring the recent changes in the site. This way > we can keep track of the changes and revert damages to the content: > > http://bioperl.org/wiki/BioPerl:Administrators > http://bioperl.org/wiki/Special:Recentchanges > > You can also keep track of the changes by using the RSS or Atom > feeds provided by the Recentchanges page: > > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom > > The wiki system has memory of the blocked users and IP's, you can > have a look here: > > http://bioperl.org/wiki/Special:Ipblocklist > > There also exists a Blacklist, which is a complement to the main > Wikimedia's one and helps detect spam content before it goes into a > document: > > http://bioperl.org/wiki/Help:Blacklist > http://meta.wikimedia.org/wiki/Spam_blacklist > > I don't know who's in charge of BioPython's wiki but I hope this > info can be helpful to you. > > Regards, > Mauricio. > > Peter wrote: >> I've noticed someone has created an account "Ceas" on the wiki and >> has been inserting junk/spam links. For example, look at the >> history of the main page: >> http://biopython.org/wiki/Biopython >> Who is in charge of the Wiki? Can we >> (a) block this account (short term action) >> (b) tighten up rules for creating new accounts? >> Peter >> _______________________________________________ >> BioPython mailing list - BioPython at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From jason.stajich at duke.edu Thu Jun 15 16:45:50 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 15 Jun 2006 12:45:50 -0400 Subject: [BioPython] Fwd: Abuse of the new Wiki Homepage References: Message-ID: <29F97001-146E-414A-8E5D-330AEDAB3392@duke.edu> Begin forwarded message: > From: Jason Stajich > Date: June 15, 2006 12:40:13 PM EDT > To: Mauricio Herrera Cuadra > Cc: biopython at biopython.org, Chris Dagdigian , > Chris Fields > Subject: Re: [BioPython] Abuse of the new Wiki Homepage > > I'm not convinced the blacklist is working - but we need to make > sure it is enabled in the conf file on the server. I've locked the > blacklist page as well so that only sysops can edit it. Iddo and > Michiel are the main site admins right now, other people can be > promoted by them or one of the main site admins if we know who you > are. > > I've blocked the previous spammer's account. You can easily revert > changes by using the rollback button on the diff page. > > The biopython community will have to decide how it wants to handle > new accounts to the wiki site. Whether there is patrolling or if > you want to lock the site down. I would encourage all legitimate > users to add something to their User page so that we can have an > easier time distinguishing random account creation from real people. > > -jason > On Jun 15, 2006, at 12:13 PM, Mauricio Herrera Cuadra wrote: > >> Hi Peter, >> >> We started to have the same problem in the BioPerl wiki some >> months ago. The way we usually solve this is by blocking the user >> account and rolling back to the previous version of the affected >> document. >> >> We have a list of wiki administrators who are constantly (and >> independently) monitoring the recent changes in the site. This way >> we can keep track of the changes and revert damages to the content: >> >> http://bioperl.org/wiki/BioPerl:Administrators >> http://bioperl.org/wiki/Special:Recentchanges >> >> You can also keep track of the changes by using the RSS or Atom >> feeds provided by the Recentchanges page: >> >> http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss >> http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom >> >> The wiki system has memory of the blocked users and IP's, you can >> have a look here: >> >> http://bioperl.org/wiki/Special:Ipblocklist >> >> There also exists a Blacklist, which is a complement to the main >> Wikimedia's one and helps detect spam content before it goes into >> a document: >> >> http://bioperl.org/wiki/Help:Blacklist >> http://meta.wikimedia.org/wiki/Spam_blacklist >> >> I don't know who's in charge of BioPython's wiki but I hope this >> info can be helpful to you. >> >> Regards, >> Mauricio. >> >> Peter wrote: >>> I've noticed someone has created an account "Ceas" on the wiki >>> and has been inserting junk/spam links. For example, look at the >>> history of the main page: >>> http://biopython.org/wiki/Biopython >>> Who is in charge of the Wiki? Can we >>> (a) block this account (short term action) >>> (b) tighten up rules for creating new accounts? >>> Peter >>> _______________________________________________ >>> BioPython mailing list - BioPython at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython >> >> -- >> MAURICIO HERRERA CUADRA >> arareko at campus.iztacala.unam.mx >> Laboratorio de Gen?tica >> Unidad de Morfofisiolog?a y Funci?n >> Facultad de Estudios Superiores Iztacala, UNAM >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From rohini.damle at gmail.com Thu Jun 15 16:36:27 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Thu, 15 Jun 2006 09:36:27 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <449085AD.7010801@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> Message-ID: Hi, I am using BioPython 1.41 on windows I have also updated NcbIstandalone.pyfor the link u gave. here is my code. from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML blast_out = open("4proteinblast.xml","r") b_iterator = NCBIStandalone.Iterator(blast_out, NCBIXML.BlastParser()) for b_record in b_iterator : query_name = b_record.query print query_name for alignment in b_record.alignments: print '****Alignment****' print 'sequence:', alignment.title This code gives "sequences producing significant alignments for all the 4 proteins #but printing querry name as P1 I mean I am getting all the information I want but I have 4 protein querries and this code is giving only P1 as a query (not P2, P3, P4 but giving information about them) I ma attachin the xml file of 4 protein blast results. _thank you for your help. On 6/14/06, Peter wrote: > > Rohini Damle wrote: > > Thank you very much for your help. > > I have 55-56 proteins & I am using Blast to find out short, nearly exact > > matches. The xml parser works fine for first record but even if I used > the > > iterator, I CAN NOT ITERATE through the records, I have used the same > code > > as u have given, what might be wrong? > > Rohini. > > If you you send us a short be of example code, and the error message > that would help. Also, what version of BioPython are you using, and do > you have Windows or Linux or MacOS... > > One guess is that you will need to update the NCBIStandalone.py file to > include a recent fix for iterating XML files. > > Assuming you are using BioPython 1.41 on Windows, the click on this link > and pick "download" near the top of the page to get the latest verion: > > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython > > Save it here: > > c:\python24\lib\site-packages\Bio\Blast\NCBIStandalone.py > > (Make a copy of the old file first, just in case) > > Peter > > -------------- next part -------------- A non-text attachment was scrubbed... Name: 4proteinblast.xml Type: text/xml Size: 98271 bytes Desc: not available URL: From cjfields at uiuc.edu Thu Jun 15 16:41:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Jun 2006 11:41:05 -0500 Subject: [BioPython] Abuse of the new Wiki Homepage In-Reply-To: <4491873E.50509@campus.iztacala.unam.mx> Message-ID: <000601c6909a$7f4ec5b0$15327e82@pyrimidine> Looks like Jason's doing some work on the BioPython wiki to get it up to speed. I added Help:Blacklist as a start. Like Mauricio said, probably need to get a small group of sysadmins together to keep an eye on things and block potential spammers. Chris > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Thursday, June 15, 2006 11:14 AM > To: biopython at lists.open-bio.org > Cc: biopython at biopython.org; Jason Stajich; Chris Dagdigian; Chris Fields > Subject: Re: [BioPython] Abuse of the new Wiki Homepage > > Hi Peter, > > We started to have the same problem in the BioPerl wiki some months ago. > The way we usually solve this is by blocking the user account and > rolling back to the previous version of the affected document. > > We have a list of wiki administrators who are constantly (and > independently) monitoring the recent changes in the site. This way we > can keep track of the changes and revert damages to the content: > > http://bioperl.org/wiki/BioPerl:Administrators > http://bioperl.org/wiki/Special:Recentchanges > > You can also keep track of the changes by using the RSS or Atom feeds > provided by the Recentchanges page: > > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom > > The wiki system has memory of the blocked users and IP's, you can have a > look here: > > http://bioperl.org/wiki/Special:Ipblocklist > > There also exists a Blacklist, which is a complement to the main > Wikimedia's one and helps detect spam content before it goes into a > document: > > http://bioperl.org/wiki/Help:Blacklist > http://meta.wikimedia.org/wiki/Spam_blacklist > > I don't know who's in charge of BioPython's wiki but I hope this info > can be helpful to you. > > Regards, > Mauricio. > > Peter wrote: > > I've noticed someone has created an account "Ceas" on the wiki and has > > been inserting junk/spam links. For example, look at the history of the > > main page: > > > > http://biopython.org/wiki/Biopython > > > > Who is in charge of the Wiki? Can we > > (a) block this account (short term action) > > (b) tighten up rules for creating new accounts? > > > > Peter > > > > _______________________________________________ > > BioPython mailing list - BioPython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From biopython at maubp.freeserve.co.uk Thu Jun 15 17:30:18 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 18:30:18 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> Message-ID: <4491992A.5040301@maubp.freeserve.co.uk> Rohini Damle wrote: > Hi, > I am using BioPython 1.41 on windows I have also updated > NcbIstandalone.pyfor the link u gave. here is my code. > > from Bio.Blast import NCBIStandalone > from Bio.Blast import NCBIXML > blast_out = open("4proteinblast.xml","r") > b_iterator = NCBIStandalone.Iterator(blast_out, NCBIXML.BlastParser()) > > for b_record in b_iterator : > query_name = b_record.query > print query_name > for alignment in b_record.alignments: > print '****Alignment****' > print 'sequence:', alignment.title > > This code gives "sequences producing significant alignments for all the 4 > proteins but printing querry name as P1 This code does the same thing, but prints less on screen so its easier to read: from Bio.Blast import NCBIStandalone from Bio.Blast import NCBIXML blast_out = open("4proteinblast.xml","r") b_iterator = NCBIStandalone.Iterator(blast_out, NCBIXML.BlastParser()) for b_record in b_iterator : query_name = b_record.query print query_name for alignment in b_record.alignments: print query_name, alignment.title.split()[0] > I mean I am getting all the information I want but I have 4 protein > querries and this code is giving only P1 as a query (not P2, P3, P4 > but giving information about them) I ma attachin the xml file of > 4 protein blast results. thank you for your help. Looking at the raw XML file by hand, I could only see references to P1, the first protein. If the file had results for all four proteins I would expect to see: ... results for P1 ... ... results for P2 ... ... results for P3 ... ... results for P4 ... Are you sure you gave Blast all four input sequences - and not just the first sequence? Peter From mdehoon at c2b2.columbia.edu Thu Jun 15 17:43:51 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 15 Jun 2006 13:43:51 -0400 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <4491992A.5040301@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> Message-ID: <44919C57.7030204@c2b2.columbia.edu> Peter wrote: > > Looking at the raw XML file by hand, I could only see references to P1, > the first protein. > > If the file had results for all four proteins I would expect to see: > > > ... results for P1 ... > > ... results for P2 ... > > ... results for P3 ... > > ... results for P4 ... > There are results for all four proteins in the XML file, but they look like this: 2 2_20304 p2 ... and so on. Could you let us know how this XML file was generated? --Michiel -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Thu Jun 15 17:53:53 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 18:53:53 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> Message-ID: <44919EB1.1080805@maubp.freeserve.co.uk> I know you haven't got the XML parsing working get - but I thought I should point something else out... Muthuraman, Manickam wrote: > from Bio import Fasta > file_for_blast=open('/home/manickam/Documents/m_cold.fasta','r') > f_iterator=Fasta.Iterator(file_for_blast) > f_record=f_iterator.next() f_record will contain a single fasta record (the first entry in the file m_cold.fasta only). > from Bio.Blast import NCBIWWW > result_handle=NCBIWWW.qblast('blastp','nr',f_record, format_type="XML") This will only run blast on the one record (i.e. the first fasta entry in m_cold.fasta), so the resulting XML file will only have blast results for this protein. I'm not sure if you can use the online NCBI blast (i.e. NCBIWWW.qblast) to submit multiple queries... You might want to install stand alone blast on your own machine - as this will accept multiple inputs. You just tell it to read m_cold.fasta as its input file, and the resulting XML file will contain the results for each sequence in the fasta file. Note that if you know in advance that the XML blast output is from a single input query, you don't need the NCBI iterator. Peter From rohini.damle at gmail.com Thu Jun 15 18:24:38 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Thu, 15 Jun 2006 11:24:38 -0700 Subject: [BioPython] (no subject) Message-ID: > I opened the 'search for short nearly exact match' blast tool then > enterd these prtein sequences > >p1 > FILGIIITV > >p2 > GLFDFVNFV > >p3 > FLIVSLCPT > >p4 > RVYEALYYV > > > Set parameters like evalue and organism and chose the putput format as XML > The output does not contain references for all the 4 proteins inthe > starting but in the block (one block for each protein) > is there any other way to generate the XML formatted output? > -Rohini. From biopython at maubp.freeserve.co.uk Thu Jun 15 18:38:54 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 19:38:54 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <44919C57.7030204@c2b2.columbia.edu> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> Message-ID: <4491A93E.2020306@maubp.freeserve.co.uk> Michiel Jan Laurens de Hoon wrote: > Peter wrote: > >>Looking at the raw XML file by hand, I could only see references to P1, >>the first protein. >> >>If the file had results for all four proteins I would expect to see: >> >> >>... results for P1 ... >> >>... results for P2 ... >> >>... results for P3 ... >> >>... results for P4 ... >> > > There are results for all four proteins in the XML file, but they look > like this: > > > 2 > 2_20304 > p2 > ... > > > and so on. Oh yeah. I should have seen that, sorry. According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], maybe they changed the XML format without telling anyone? I couldn't see anything obvious on this page: http://www.ncbi.nlm.nih.gov/blast/blast_whatsnew.shtml This looks like the source code here: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi.tar.gz And you can view their CVS here: http://www.ncbi.nlm.nih.gov/cvsweb/index.cgi/ncbi/algo/blast/ There is nothing in the check-in comments that leaps out at me regarding XML iterations... > > Could you let us know how this XML file was generated? > e.g. Standalone or online? Peter From cariaso at yahoo.com Thu Jun 15 18:39:21 2006 From: cariaso at yahoo.com (Mike Cariaso) Date: Thu, 15 Jun 2006 11:39:21 -0700 (PDT) Subject: [BioPython] Fwd: Abuse of the new Wiki Homepage In-Reply-To: <29F97001-146E-414A-8E5D-330AEDAB3392@duke.edu> Message-ID: <20060615183921.27494.qmail@web52711.mail.yahoo.com> > The biopython community will have to decide how it wants to handle > new accounts to the wiki site. Whether there is patrolling or if > you want to lock the site down. I would encourage all legitimate > users to add something to their User page so that we can have an > easier time distinguishing random account creation from real people. Consider this my vote against any sort of lock down against new users. It can be a real deterent to new contributors, and that is something we sorely need. I'd be more willing to roll back the useless spam, than to risk detering valuable new contributions. Thank you to Maubp for already removing all of Ceas's garbage. _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From rohini.damle at gmail.com Thu Jun 15 18:44:38 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Thu, 15 Jun 2006 11:44:38 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <4491A93E.2020306@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> <4491A93E.2020306@maubp.freeserve.co.uk> Message-ID: I used online ncbi blast to generate the xml output Rohini On 6/15/06, Peter wrote: > > Michiel Jan Laurens de Hoon wrote: > > Peter wrote: > > > >>Looking at the raw XML file by hand, I could only see references to P1, > >>the first protein. > >> > >>If the file had results for all four proteins I would expect to see: > >> > >> > >>... results for P1 ... > >> > >>... results for P2 ... > >> > >>... results for P3 ... > >> > >>... results for P4 ... > >> > > > > There are results for all four proteins in the XML file, but they look > > like this: > > > > > > 2 > > 2_20304 > > p2 > > ... > > > > > > and so on. > > Oh yeah. I should have seen that, sorry. > > According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], maybe > they changed the XML format without telling anyone? > > I couldn't see anything obvious on this page: > > http://www.ncbi.nlm.nih.gov/blast/blast_whatsnew.shtml > > This looks like the source code here: > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi.tar.gz > > And you can view their CVS here: > > http://www.ncbi.nlm.nih.gov/cvsweb/index.cgi/ncbi/algo/blast/ > > There is nothing in the check-in comments that leaps out at me regarding > XML iterations... > > > > > Could you let us know how this XML file was generated? > > > > e.g. Standalone or online? > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From cjfields at uiuc.edu Thu Jun 15 16:55:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Jun 2006 11:55:40 -0500 Subject: [BioPython] Abuse of the new Wiki Homepage In-Reply-To: Message-ID: <000701c6909c$88bca480$15327e82@pyrimidine> > I'm not convinced the blacklist is working - but we need to make sure > it is enabled in the conf file on the server. I've locked the > blacklist page as well so that only sysops can edit it. Iddo and > Michiel are the main site admins right now, other people can be > promoted by them or one of the main site admins if we know who you are. Agreed. I actually added the page as 'Help:BlackList' then redirected it to 'Help:Blacklist'; someone with admin privies can delete that redirect link if they want. My oops. Like Jason says, probably doesn't make much of a difference (the wiki version of the raindance, to ward off evil spammers). > I've blocked the previous spammer's account. You can easily revert > changes by using the rollback button on the diff page. > > The biopython community will have to decide how it wants to handle > new accounts to the wiki site. Whether there is patrolling or if you > want to lock the site down. I would encourage all legitimate users > to add something to their User page so that we can have an easier > time distinguishing random account creation from real people. > > -jason > On Jun 15, 2006, at 12:13 PM, Mauricio Herrera Cuadra wrote: > > > Hi Peter, > > > > We started to have the same problem in the BioPerl wiki some months > > ago. The way we usually solve this is by blocking the user account > > and rolling back to the previous version of the affected document. > > > > We have a list of wiki administrators who are constantly (and > > independently) monitoring the recent changes in the site. This way > > we can keep track of the changes and revert damages to the content: > > > > http://bioperl.org/wiki/BioPerl:Administrators > > http://bioperl.org/wiki/Special:Recentchanges > > > > You can also keep track of the changes by using the RSS or Atom > > feeds provided by the Recentchanges page: > > > > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss > > http://bioperl.org/w/index.php?title=Special:Recentchanges&feed=atom > > > > The wiki system has memory of the blocked users and IP's, you can > > have a look here: > > > > http://bioperl.org/wiki/Special:Ipblocklist > > > > There also exists a Blacklist, which is a complement to the main > > Wikimedia's one and helps detect spam content before it goes into a > > document: > > > > http://bioperl.org/wiki/Help:Blacklist > > http://meta.wikimedia.org/wiki/Spam_blacklist > > > > I don't know who's in charge of BioPython's wiki but I hope this > > info can be helpful to you. > > > > Regards, > > Mauricio. > > > > Peter wrote: > >> I've noticed someone has created an account "Ceas" on the wiki and > >> has been inserting junk/spam links. For example, look at the > >> history of the main page: > >> http://biopython.org/wiki/Biopython > >> Who is in charge of the Wiki? Can we > >> (a) block this account (short term action) > >> (b) tighten up rules for creating new accounts? > >> Peter > >> _______________________________________________ > >> BioPython mailing list - BioPython at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biopython > > > > -- > > MAURICIO HERRERA CUADRA > > arareko at campus.iztacala.unam.mx > > Laboratorio de Gen?tica > > Unidad de Morfofisiolog?a y Funci?n > > Facultad de Estudios Superiores Iztacala, UNAM > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From manickam.muthuraman at wur.nl Thu Jun 15 21:29:51 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Thu, 15 Jun 2006 23:29:51 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> Dear Peter In this mail i am attaching three files :seq file,python script file and the blast output. I am using python Python 2.4.1 (#2, Aug 25 2005, 18:20:57)and biopython 1.40 i spent almost the whole evening to upgarde the python and biopython in mandriva linux but i failed. let me know is the version of python and biopython matter here thanks for helping me out of this from manickam -----Original Message----- From: Peter [mailto:biopython at maubp.freeserve.co.uk] Sent: Thu 6/15/2006 5:01 PM To: Muthuraman, Manickam Cc: biopython at lists.open-bio.org Subject: Re: [BioPython] parsing the blastoutput and printing the alingment Muthuraman, Manickam wrote: > Dear peter > > here is the code my_blast.out and the error. My need is to get all the > blast hit sequences in fasta format. By parsing and i can extract > accession number from it. I made an example fasta file containing just this one sequence twice: >example1 VPKIDVSPLFGDDQAAKMRVAQQIDAASRDTGFFYAVNHGINVQRLSQKTKEFHMSITP EEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNPNFTPDHPRIQAKTPTHEV NVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVV LIRYPYLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEA DDTGYLINCGSYMAHLTNNYYKAPIHRVKWVNAERQSLPFFVNLGYDSVI >example2 VPKIDVSPLFGDDQAAKMRVAQQIDAASRDTGFFYAVNHGINVQRLSQKTKEFHMSITP EEKWDLAIRAYNKEHQDQVRAGYYLSIPGKKAVESFCYLNPNFTPDHPRIQAKTPTHEV NVWPDETKHPGFQDFAEQYYWDVFGLSSALLKGYALALGKEENFFARHFKPDDTLASVV LIRYPYLDPYPEAAIKTAADGTKLSFEWHEDVSLITVLYQSNVQNLQVETAAGYQDIEA DDTGYLINCGSYMAHLTNNYYKAPIHRVKWVNAERQSLPFFVNLGYDSVI I then edited the filenames in your example, and ran the code. It worked for me using a fresh install of BioPython 1.41 on Linux with Python 2.4.2 So the good news is your code seems fine. Maybe there is something "funny" with your fasta file? Accented characters for example - which would then be in the output XML file? Could you send me the fasta file and the XML file (in full, as attachments), off the mailing list to avoid clogging up everyone's inboxes. Thanks Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 164714 bytes Desc: not available URL: From mdehoon at c2b2.columbia.edu Thu Jun 15 22:37:18 2006 From: mdehoon at c2b2.columbia.edu (Michiel Jan Laurens de Hoon) Date: Thu, 15 Jun 2006 18:37:18 -0400 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <4491A93E.2020306@maubp.freeserve.co.uk> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> <4491A93E.2020306@maubp.freeserve.co.uk> Message-ID: <4491E11E.5020705@c2b2.columbia.edu> Peter wrote: > According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], maybe > they changed the XML format without telling anyone? > It appears that the XML format did change. With Blastp 2.2.14, multiple searches generate multiple ... blocks, one for each search. With an older Blastp, multiple searches effectively generate multiple XML files (each with one ... block). These files are then concatenated into one output file. Biopython then parses this file by looking for the beginning of each XML file in this output file. The new output is in a sense better because the output file is a valid XML file. It may be that Biopython's XML parser ignores the tags, since in the old format there was only one block anyway, and therefore fails with the new format. --Michiel. -- Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1130 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Thu Jun 15 22:31:59 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jun 2006 23:31:59 +0100 Subject: [BioPython] parsing the blastoutput and printing the alingment In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> Message-ID: <4491DFDF.9070506@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > Dear Peter > > In this mail i am attaching three files :seq file,python script file > and the blast output. I am using python Python 2.4.1 (#2, Aug 25 > 2005, 18:20:57)and biopython 1.40 Your attachment came as a weird winmail.dat file - something Outlook and the Microsoft Exchange Client sometimes does. There is a Linux tool to "unzip" the file called tnef, which I installed on Ubuntu with a simple "apt-get install tnef" Anyway, the problem is simply that your XML file has this little HTTP header at the start: HTTP/1.1 200 OK Date: Thu, 15 Jun 2006 21:23:08 GMT Server: Nde Content-Type: application/xml Connection: close If you edit the file to remove this, the BioPython can read the file fine. Looking over my old email, Michiel de Hoon checked in a fix from Alexander Morgan for this in March. You need to update this file: /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py Latest code is available here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIWWW.py?cvsroot=biopython It also gets rid of this annoying message: UserWarning: qblast works only with blastn and blastp for now. Peter From manickam.muthuraman at wur.nl Fri Jun 16 14:27:00 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Fri, 16 Jun 2006 16:27:00 +0200 Subject: [BioPython] Running Blast locally References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> Dear peter In the last mail i said that b_record is none , so i tried to run the blastall in my local computer and it works right now. here is the command : ./blastall -d db/swissprot -i /home/manickam/Documents/m_cold.fasta -p blastp and i am getting the result. so let me know if i need to put this command in string and pass this string (example:my_blast_exe). Still i want to know how to pass the input file(my_blast_file). i think i confuse myself let me know your view for this from manickam From winter at biotec.tu-dresden.de Fri Jun 16 14:35:56 2006 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Fri, 16 Jun 2006 16:35:56 +0200 Subject: [BioPython] Running Blast locally In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> Message-ID: <4492C1CC.4020607@biotec.tu-dresden.de> Dear Manickam, Can you try blastall -V T -d db/swissprot -i /home/manickam/Documents/m_cold.fasta -p blastp instead? Christof Muthuraman, Manickam wrote: > Dear peter > > In the last mail i said that b_record is none , so i tried to run the blastall in my local computer and it works right now. > > here is the command : > ./blastall -d db/swissprot -i /home/manickam/Documents/m_cold.fasta -p blastp > and i am getting the result. so let me know if i need to put this command in string and pass this string (example:my_blast_exe). Still i want to know how to pass the input file(my_blast_file). > > i think i confuse myself > let me know your view for this > from > manickam > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- Christof Winter Bioinformatics Group TU Dresden Tatzberg 47-51 01307 Dresden, Germany From manickam.muthuraman at wur.nl Fri Jun 16 14:52:15 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Fri, 16 Jun 2006 16:52:15 +0200 Subject: [BioPython] Running Blast locally References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> <4492C1CC.4020607@biotec.tu-dresden.de> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFAC@salte0008.wurnet.nl> Dear Christof Your command also works separately but my question was how to intergrate blast in biopython script. in biopython tutorial and cookbook they have the follwoing code where i need to provide the path to database ,file to blast and blast_exe. I am not clear how to set the path for seq_file,db and exe. import os my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') here is the whole script import os my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') from Bio.Blast import NCBIStandalone blast_out,error_info=NCBIStandalone.blastall(my_blast_exe,'blastp',my_blast_db,my_blast_file) b_parser=NCBIStandalone.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) b_record=b_iterator.next() while 1: b_record=b_iterator.next() if b_record is None: break for alignment in b_record.alignments: print "inside 2 loop" for hsp in alignment.hsps: print "inside 1 loop" print 'seq:',alignment.title it runs but b_record is None so it comes out of the while loop at first time itself. so it mean i am not getting out put of the blast. from manickam From manickam.muthuraman at wur.nl Fri Jun 16 08:42:08 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Fri, 16 Jun 2006 10:42:08 +0200 Subject: [BioPython] parsing the blastoutput and printing the alingment References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFA7@salte0008.wurnet.nl> Thanks peter After overwriting the NCBIWWW.py header file my script works. once again i would like to thank from manickam -----Original Message----- From: Peter [mailto:biopython at maubp.freeserve.co.uk] Sent: Fri 6/16/2006 12:31 AM To: Muthuraman, Manickam Cc: biopython at lists.open-bio.org Subject: Re: [BioPython] parsing the blastoutput and printing the alingment Muthuraman, Manickam wrote: > Dear Peter > > In this mail i am attaching three files :seq file,python script file > and the blast output. I am using python Python 2.4.1 (#2, Aug 25 > 2005, 18:20:57)and biopython 1.40 Your attachment came as a weird winmail.dat file - something Outlook and the Microsoft Exchange Client sometimes does. There is a Linux tool to "unzip" the file called tnef, which I installed on Ubuntu with a simple "apt-get install tnef" Anyway, the problem is simply that your XML file has this little HTTP header at the start: HTTP/1.1 200 OK Date: Thu, 15 Jun 2006 21:23:08 GMT Server: Nde Content-Type: application/xml Connection: close If you edit the file to remove this, the BioPython can read the file fine. Looking over my old email, Michiel de Hoon checked in a fix from Alexander Morgan for this in March. You need to update this file: /usr/lib/python2.4/site-packages/Bio/Blast/NCBIWWW.py Latest code is available here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIWWW.py?cvsroot=biopython It also gets rid of this annoying message: UserWarning: qblast works only with blastn and blastp for now. Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3991 bytes Desc: not available URL: From manickam.muthuraman at wur.nl Fri Jun 16 13:12:08 2006 From: manickam.muthuraman at wur.nl (Muthuraman, Manickam) Date: Fri, 16 Jun 2006 15:12:08 +0200 Subject: [BioPython] Running Blast locally References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> Message-ID: <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> Dear Peter i am not clear about the subtopic running blast locally let me explain in detail i have blast executable files in my home directory i.e /home/manickam/blast/blastall i have my database files of nr,swissprot,pdb in /usr/junk/ the files which i can see under /usr/junk/ folder are nr.00.phr nr.00.ppi nr.01.phr nr.01.ppi nr.pal pdbaa.00.msk lot in there and there extenstions are *.phr , ppi ,pal,msk,psq i am not clear from the manual where do i need to provide the input sequences and how to i store the out put after running the local blast. below is the following code which i tried and it works but b_record is none. mport os my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') from Bio.Blast import NCBIStandalone blast_out,error_info=NCBIStandalone.blastall(my_blast_exe,'blastp',my_blast_db,my_blast_file) b_parser=NCBIStandalone.BlastParser() b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) b_record=b_iterator.next() while 1: b_record=b_iterator.next() if b_record is None: break for alignment in b_record.alignments: for hsp in alignment.hsps: print 'seq:',alignment.title from manickam -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3446 bytes Desc: not available URL: From biopython at maubp.freeserve.co.uk Fri Jun 16 15:53:31 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 16 Jun 2006 16:53:31 +0100 Subject: [BioPython] Running Blast locally In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFAC@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFAB@salte0008.wurnet.nl> <4492C1CC.4020607@biotec.tu-dresden.de> <4CDD243B32D07748944828EA7A29E4A3E2AFAC@salte0008.wurnet.nl> Message-ID: <4492D3FB.1040706@maubp.freeserve.co.uk> Muthuraman, Manickam wrote: > Dear Christof > > Your command also works separately but my question was how to intergrate blast in biopython script. > > in biopython tutorial and cookbook they have the follwoing code where i need to provide the path to database ,file to blast and blast_exe. > > I am not clear how to set the path for seq_file,db and exe. > > import os > my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') > my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') > my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') Try typing this at the python prompt: import os help(os.path.join) Are you familiar with relative paths etc? You might find something like this easier to understand: my_blast_db = '/home/manickam/db/at-est/a-cds-10-7.fasta') my_blast_file = '/home/manickam/sorghum_est-test.fasta') my_blast_exe = '/home/manickam/blast/blastall' Or, based on you previous email you were using: > here is the command : > ./blastall -d db/swissprot -i /home/manickam/Documents/m_cold.fasta > -p blastp Maybe something like this: my_blast_db = '/home/manickam/blast/db/swissprot') my_blast_file = '/home/manickam/Documents/m_cold.fasta') my_blast_exe = '/home/manickam/blast/blastall' It all depends on where you installed the blast program, where you put the blast databases, and where you are going to have your inputfile. > here is the whole script 01> import os 02> my_blast_db=os.path.join(os.getcwd(),'at-est','a-cds-10-7.fasta') 03> my_blast_file=os.path.join(os.getcwd(),'at-est','test_blast','sorghum_est-test.fasta') 04> my_blast_exe=os.path.join(os.getcwd(),'blast','/home/manickam/blast/blastall') 05> from Bio.Blast import NCBIStandalone 06> blast_out,error_info=NCBIStandalone.blastall(my_blast_exe,'blastp',my_blast_db,my_blast_file) At this point, some example scripts will save the output to a file, and then reload it and carry on. This is very helpful if you have problems because you can open the file by hand and look at it. 07> b_parser=NCBIStandalone.BlastParser() 08> b_iterator=NCBIStandalone.Iterator(blast_out,b_parser) 09> b_record=b_iterator.next() 10> while 1: 11> b_record=b_iterator.next() 12> if b_record is None: 13> break 14> for alignment in b_record.alignments: 15> print "inside 2 loop" 16> for hsp in alignment.hsps: 17> print "inside 1 loop" 18> print 'seq:',alignment.title > > it runs but b_record is None so it comes out of the while loop at first time itself. so it mean i am not getting out put of the blast. Notice that at line 9, you set b_record to the first set of results (i.e. from the first sequence in your FASTA file). Then, inside the look, at line 11 set b_record to the SECOND set of results and try and look at it. I suggest you comment out line 9, and it should work better. Finally, this code is using the "plain text" blast output, which can sometimes cause BioPython trouble. I would recommend the XML parser but as you might know from the mailing list, it looks like they have changed the file format for multiple results in XML output... Peter From biopython at maubp.freeserve.co.uk Fri Jun 16 16:06:14 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 16 Jun 2006 17:06:14 +0100 Subject: [BioPython] Running Blast locally In-Reply-To: <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> References: <4CDD243B32D07748944828EA7A29E4A3E2AF9B@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9D@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AF9E@salte0008.wurnet.nl> <4CDD243B32D07748944828EA7A29E4A3E2AFA1@salte0008.wurnet.nl> <44917656.6090602@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA5@salte0008.wurnet.nl> <4491DFDF.9070506@maubp.freeserve.co.uk> <4CDD243B32D07748944828EA7A29E4A3E2AFA9@salte0008.wurnet.nl> Message-ID: <4492D6F6.5060100@maubp.freeserve.co.uk> I didn't see this email - they arrived out of order at my computer. Please also read my longer reply... Muthuraman, Manickam wrote: > i have blast executable files in my home directory i.e > /home/manickam/blast/blastall Then use this: my_blast_exe='/home/manickam/blast/blastall' > i have my database files of nr,swissprot,pdb in /usr/junk/ > > the files which i can see under /usr/junk/ folder are > nr.00.phr > nr.00.ppi > nr.01.phr > nr.01.ppi nr.pal > pdbaa.00.msk > > lot in there and there extenstions are *.phr , ppi ,pal,msk,psq I think you should use one of these, but I haven't checked this: my_blast_db='/usr/junk/nr' my_blast_db='/usr/junk/swissprot' my_blast_db='/usr/junk/pdb' > i am not clear from the manual where do i need to provide the input sequences The input fasta file can be anywhere - you just have to tell Blast where it is. e.g. my_blast_file='/home/manickam/Documents/m_cold.fasta') > and how to i store the out put after running the local blast. If you run blast "by hand" at the command prompt, use the option -o outputfilename (that is a lower case letter o, not zero, not uppercase). You can also using python to write the results to a file. > below is the following code which i tried and it works but b_record is none. See my other email Peter From gvwilson at cs.utoronto.ca Sun Jun 18 18:15:04 2006 From: gvwilson at cs.utoronto.ca (Greg Wilson) Date: Sun, 18 Jun 2006 14:15:04 -0400 Subject: [BioPython] ann: open source course on basic software development skills Message-ID: http://www.third-bit.com/swc is an open source course on basic software development skills, aimed primarily at people with backgrounds in science, engineering, and medicine who have little formal training in programming, but find themselves doing a lot of it. The course was developed in part through support from the Python Software Foundation; all of the material can be used and modified free of charge (but with attribution). If you have questions, would like to contribute material, or have a success story you'd like to share, please contact Greg Wilson (gvwilson at cs.utoronto.ca). Thanks, Greg From rohini.damle at gmail.com Mon Jun 19 23:36:36 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Mon, 19 Jun 2006 16:36:36 -0700 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: <4491E11E.5020705@c2b2.columbia.edu> References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> <4491A93E.2020306@maubp.freeserve.co.uk> <4491E11E.5020705@c2b2.columbia.edu> Message-ID: So what do one need to do to make biopython working? Make changes in the XML parser so that it will consider one iteration for one result out put? -Rohini On 6/15/06, Michiel Jan Laurens de Hoon wrote: > > Peter wrote: > > According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], maybe > > they changed the XML format without telling anyone? > > > It appears that the XML format did change. > With Blastp 2.2.14, multiple searches generate multiple > ... blocks, one for each search. > With an older Blastp, multiple searches effectively generate multiple > XML files (each with one ... block). These files > are then concatenated into one output file. Biopython then parses this > file by looking for the beginning of each XML file in this output file. > > The new output is in a sense better because the output file is a valid > XML file. It may be that Biopython's XML parser ignores the > tags, since in the old format there was only one block > anyway, and therefore fails with the new format. > > --Michiel. > > -- > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1130 St Nicholas Avenue > New York, NY 10032 > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From biopython at maubp.freeserve.co.uk Tue Jun 20 13:52:48 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 20 Jun 2006 14:52:48 +0100 Subject: [BioPython] plain txt blast output - xml instead In-Reply-To: References: <448FD25C.20101@maubp.freeserve.co.uk> <449085AD.7010801@maubp.freeserve.co.uk> <4491992A.5040301@maubp.freeserve.co.uk> <44919C57.7030204@c2b2.columbia.edu> <4491A93E.2020306@maubp.freeserve.co.uk> <4491E11E.5020705@c2b2.columbia.edu> Message-ID: <4497FDB0.1000903@maubp.freeserve.co.uk> Peter wrote: >>> According to the XML file, it is from BLASTP 2.2.14 [May-07-2006], >>> maybe they changed the XML format without telling anyone? Michiel wrote: >>It appears that the XML format did change. >>With Blastp 2.2.14, multiple searches generate multiple >>... blocks, one for each search. >>With an older Blastp, multiple searches effectively generate multiple >>XML files (each with one ... block). These files >>are then concatenated into one output file. Biopython then parses this >>file by looking for the beginning of each XML file in this output file. >> >>The new output is in a sense better because the output file is a valid >>XML file. It may be that Biopython's XML parser ignores the >>tags, since in the old format there was only one block >>anyway, and therefore fails with the new format. Rohini Damle wrote: > So what do one need to do to make biopython working? Make changes in > the XML parser so that it will consider one iteration for one result > output? Basically, yes, we need to change the BioPython NCBI Blast XML code somehow - this might be best moved to the development mailing list. Some relevant but probably slightly out of data documentation: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/xml/README.blxml Notice this appears to describe the ... block as follows: BlastOutput_iter-num: the psi-blast iteration number (optional) So whatever we do, we should have a look at the psi-blast output as well... One idea I was thinking about is to modify the existing Blast XML parser to specify WHICH iteratation number it should parse (ignoring the rest). An invalid iteration number would throw a new exception error. Then, a new Blast XML iterator would call the parser repeatedly incrementing the iteration number until the "invalid iteration number" error was raised, which would signal the end. Note that with the "old style concatenated XML entries" we could parse each entry one by one, without having to load the entire XML file into memory at once. I don't think that will be possible with the new style XML files. Peter From biopython at maubp.freeserve.co.uk Wed Jun 21 14:27:06 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 21 Jun 2006 15:27:06 +0100 Subject: [BioPython] docs have moved on the website Message-ID: <4499573A.5060409@maubp.freeserve.co.uk> I don't know if anyone has noticed this, but for example this: http://www.biopython.org/docs/cookbook/genbank_to_fasta.html Has moved to here: http://www.biopython.org/DIST/docs/cookbook/genbank_to_fasta.html Is it too late to revert to the old position? If it is, to preserve any old links from external sites (and also to save google and other search engines having to update their indexes) maybe the website could automatically forward queries for: http://www.biopython.org/docs/* to: http://www.biopython.org/DIST/docs/* Good idea? Bad idea? Peter From rohini.damle at gmail.com Wed Jun 21 19:06:29 2006 From: rohini.damle at gmail.com (Rohini Damle) Date: Wed, 21 Jun 2006 12:06:29 -0700 Subject: [BioPython] Biopython's XMl parser fails with NCBI blast changed XML output format Message-ID: Hi, I am trying to parse the blast output (XML formatted, using online NCBI's blast) I got as a result for 'short nearly exact matches' for my 50-55 short protein sequences. It looks like the XML format has changed and biopython's XML parser fails to parse the blast records. can somebody show a way to fix this thing? Thank you Rohini Damle From biopython at maubp.freeserve.co.uk Sun Jun 25 21:37:53 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 25 Jun 2006 22:37:53 +0100 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> Message-ID: <449F0231.2050308@maubp.freeserve.co.uk> [Off topic, but recently has anyone else get valid messages bounced due to a "suspicious header"?] Hello List, I recently wanted to load a "PHYLIP distance matrix file" created by clustalw for my own research... As discussed earlier, clustalw bends the official PHYLIP specification by not truncating long names to 10 characters. For my dataset I need the long names to avoid ambiguity. The attached code implements a fairly simple distance matrix class and associated code to read (parse) and write PHYLIP style distance matrices. There are options to control strict 10 character name truncation, and the separator character(s) when writing files. Internally, I store the distances as a list of lists (of different lengths) to mimic a lower triangular matrix. For example, this matrix: [[0.0, 0.1, 0.2], [0.1, 0.0, 0.5], [0.2, 0.5, 0.0]] Is stored as this: [[], [0.1], [0.2, 0.5]] This may not be the best way to do this in terms of speed and memory usage. There are some simple test cases included, but I have pushed the code very far and there may be problems. Anyway - in case anyone is interested either in the short term, or for ideas for how BioPython could support these files - here it is. I'm sure someone more familiar with arrays (Numeric and NumPy) would be able to make the class act more like an array - but the basics are there. As far as I could see, neither Numeric or NumPy have a specific symmetric matrix / symmetric array class which would be ideal. Members of the list are welcome to use the code, but please contact me before re-distributing it to anyone else. Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: phylip_dst.py Type: text/x-python Size: 16528 bytes Desc: not available URL: From chris.lasher at gmail.com Tue Jun 27 21:34:37 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 27 Jun 2006 17:34:37 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <449F0231.2050308@maubp.freeserve.co.uk> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> <449F0231.2050308@maubp.freeserve.co.uk> Message-ID: <128a885f0606271434v4d5a40e9x1ceb0037d750f6a1@mail.gmail.com> Hi Peter, Would you be up for licensing your code under the BioPython license? If not, I shouldn't look at it, as I've started coding my own module for the project. From your description, your module sounds very good. =-) Chris On 6/25/06, Peter wrote: > [Off topic, but recently has anyone else get valid messages bounced due > to a "suspicious header"?] > > Hello List, > > I recently wanted to load a "PHYLIP distance matrix file" created by > clustalw for my own research... > > As discussed earlier, clustalw bends the official PHYLIP specification > by not truncating long names to 10 characters. For my dataset I need > the long names to avoid ambiguity. > > The attached code implements a fairly simple distance matrix class and > associated code to read (parse) and write PHYLIP style distance matrices. > > There are options to control strict 10 character name truncation, and > the separator character(s) when writing files. > > Internally, I store the distances as a list of lists (of different > lengths) to mimic a lower triangular matrix. > > For example, this matrix: > > [[0.0, 0.1, 0.2], > [0.1, 0.0, 0.5], > [0.2, 0.5, 0.0]] > > Is stored as this: > > [[], [0.1], [0.2, 0.5]] > > This may not be the best way to do this in terms of speed and memory usage. > > There are some simple test cases included, but I have pushed the code > very far and there may be problems. Anyway - in case anyone is > interested either in the short term, or for ideas for how BioPython > could support these files - here it is. > > I'm sure someone more familiar with arrays (Numeric and NumPy) would be > able to make the class act more like an array - but the basics are there. > > As far as I could see, neither Numeric or NumPy have a specific > symmetric matrix / symmetric array class which would be ideal. > > Members of the list are welcome to use the code, but please contact me > before re-distributing it to anyone else. > > Peter > > > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > > From biopython at maubp.freeserve.co.uk Tue Jun 27 22:33:34 2006 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 27 Jun 2006 23:33:34 +0100 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <128a885f0606271434v4d5a40e9x1ceb0037d750f6a1@mail.gmail.com> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> <449F0231.2050308@maubp.freeserve.co.uk> <128a885f0606271434v4d5a40e9x1ceb0037d750f6a1@mail.gmail.com> Message-ID: <44A1B23E.5080007@maubp.freeserve.co.uk> Chris Lasher wrote: > Hi Peter, > > Would you be up for licensing your code under the BioPython license? > If not, I shouldn't look at it, as I've started coding my own module > for the project. From your description, your module sounds very good. > =-) > > Chris I am quite happy to contribute the code to BioPython under the appropriate license, so please go ahead. I've filled a bug on adding PHYLIP distance parsers to BioPython and attached a slightly revised version of the code (added "fuzzy" equality testing of matrices - mainly for testing): http://bugzilla.open-bio.org/show_bug.cgi?id=2034 If anyone else really wants the code under some other license (GPL maybe) I could probably be persuaded. Peter From chris.lasher at gmail.com Tue Jun 27 23:32:12 2006 From: chris.lasher at gmail.com (Chris Lasher) Date: Tue, 27 Jun 2006 19:32:12 -0400 Subject: [BioPython] Distance Matrix Parsers In-Reply-To: <44A1B23E.5080007@maubp.freeserve.co.uk> References: <128a885f0606081432k7dc9b988rdccbc3be03ca62b6@mail.gmail.com> <448A9A7A.6050501@maubp.freeserve.co.uk> <449F0231.2050308@maubp.freeserve.co.uk> <128a885f0606271434v4d5a40e9x1ceb0037d750f6a1@mail.gmail.com> <44A1B23E.5080007@maubp.freeserve.co.uk> Message-ID: <128a885f0606271632q2988f2d7y543dd441535f9808@mail.gmail.com> [Oops! I didn't realize I was posting to the user list! Reverting it back to BP-Dev] This code looks very good, Peter! As far as licensing, I'm new to the game, but my guess is the BioPython license (http://www.biopython.org/DIST/LICENSE ) is highly prefered for BioPython. You still retain copyright with the license, but the code is more "free" than under any version of the GPL. Chris On 6/27/06, Peter wrote: > Chris Lasher wrote: > > Hi Peter, > > > > Would you be up for licensing your code under the BioPython license? > > If not, I shouldn't look at it, as I've started coding my own module > > for the project. From your description, your module sounds very good. > > =-) > > > > Chris > > I am quite happy to contribute the code to BioPython under the > appropriate license, so please go ahead. > > I've filled a bug on adding PHYLIP distance parsers to BioPython and > attached a slightly revised version of the code (added "fuzzy" equality > testing of matrices - mainly for testing): > > http://bugzilla.open-bio.org/show_bug.cgi?id=2034 > > If anyone else really wants the code under some other license (GPL > maybe) I could probably be persuaded. > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From cjfields at uiuc.edu Wed Jun 28 18:30:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Jun 2006 13:30:44 -0500 Subject: [BioPython] Wiki spammed Message-ID: <005201c69ae0$f78c59c0$15327e82@pyrimidine> Guys, Just wanted to let whoever's in charge know that you need to roll back changes to this page: http://biopython.org/wiki/Biopython The spammers have struck again! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign