From jdiezperezj at gmail.com Wed Aug 1 04:13:24 2007 From: jdiezperezj at gmail.com (=?ISO-8859-1?Q?Javier_D=EDez?=) Date: Wed, 1 Aug 2007 10:13:24 +0200 Subject: [BioPython] blast output xml In-Reply-To: <46AF51CA.2030005@maubp.freeserve.co.uk> References: <46AF51CA.2030005@maubp.freeserve.co.uk> Message-ID: Hy Peter, I followed the tutorial cookbook steps but I got the same exception (see below). The output I got running blast locally is not XML but text. I guess it might be related to the version I'm using (python 2.5, biopython 1.42.2on ubuntu 7.4). Any idea? On the other hand I think there is a mistake in the cookbook, instead of: >>> from Bio.Blast import NCBIXML >>> blast_records = NCBIXML.parse(result_handle) I think it should be: >>> blast_parser= NCBIXML.BlastParser() >>> blast_result=blast_parser.parse(result_handle) Thanks Javi Traceback (most recent call last): File "", line 1, in File "/var/lib/python-support/python2.5/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/xmlreader.py", line 125, in parse self.close() File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 226, in close self.feed("", isFinal = 1) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 220, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:0: no element found On 7/31/07, Peter wrote: > > Javier D?ez wrote: > > Hi, > > Does anyone knows if is it possible to get blast-xml output running > blast > > from biopython scripts? > > How can I do that? > > Thanks > > Javi > > Yes, you can run standalone blast from Biopython, and parse its XML > output. See "Chapter 3 BLAST" of the tutorial: > > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > Note that while parsing the plain text worked well with older versions > of BLAST. We don't recommend using this anymore - use the XML output. > > Peter > From mdehoon at c2b2.columbia.edu Wed Aug 1 04:55:05 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 1 Aug 2007 04:55:05 -0400 Subject: [BioPython] blast output xml References: <46AF51CA.2030005@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> Hi Javier, > I followed the tutorial cookbook steps but I got the same exception (see > below). The output I got running blast locally is not XML but text. I guess > it might be related to the version I'm using (python 2.5, biopython > 1.42.2on ubuntu 7.4). Any idea? Use Biopython 1.43. > On the other hand I think there is a mistake in the cookbook, instead of: > >>> from Bio.Blast import NCBIXML > >>> blast_records = NCBIXML.parse(result_handle) > I think it should be: > >>> blast_parser= NCBIXML.BlastParser() > >>> blast_result=blast_parser.parse(result_handle) The description in the cookbook is correct for Biopython 1.43. As a note to the developers: We should ask ourselves if distributing packages for various linuxes is really a good idea. Installing Biopython isn't all that hard, and the packages for linux tend to get behind as new Biopython versions come out, causing confusion for our users. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Wed Aug 1 05:38:16 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 01 Aug 2007 10:38:16 +0100 Subject: [BioPython] blast output xml In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> References: <46AF51CA.2030005@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> Message-ID: <46B05488.20609@maubp.freeserve.co.uk> In Biopython 1.42, when calling standalone blast we defaulted requesting plain text output. In Biopython 1.43, when calling standalone blast we default requesting to XML text output. Around this time, the NCBI made a big change to how they produced XML output for multiple queries, and we had to change the way the NCBIXML library was used in Biopython 1.43 - and updated the documentation to match. Michiel De Hoon wrote: > > The description in the cookbook is correct for Biopython 1.43. > Maybe we should have added an explicit "you need Biopython 1.43 or later" rider on the Blast chapter? > As a note to the developers: > We should ask ourselves if distributing packages for various linuxes is > really a good idea. Installing Biopython isn't all that hard, and the > packages for linux tend to get behind as new Biopython versions come out, > causing confusion for our users. I think having the distributions provide (current) Biopython is great, as it is much simpler for new users to install it and its dependencies this way. However, because of the problem of out of data packages, we should try and get in touch with the main distribution maintainers and try and bring them up to date more quickly. Peter From luca.beltrame at unimi.it Wed Aug 1 05:34:36 2007 From: luca.beltrame at unimi.it (Luca Beltrame) Date: Wed, 01 Aug 2007 11:34:36 +0200 Subject: [BioPython] blast output xml In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> Message-ID: <200708011134.36479.luca.beltrame@unimi.it> On Wednesday 01 August 2007 10:55:05 Michiel De Hoon wrote: > We should ask ourselves if distributing packages for various linuxes is > really a good idea. Installing Biopython isn't all that hard, and the Some (like myself) prefer to use the distribution's package management system which at least prevents conflicts with multiple versions and provides a better way to track what's installed and what's not. So, personally I think it's a good idea to distribute packages. -- Luca Beltrame, MSc. - Molecular Medicine PhD Student Dipartimento di Scienze e Tecnologie Biomediche - UniMI E-mail: luca dot beltrame at unimi dot it From sbassi at gmail.com Wed Aug 1 13:48:33 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 1 Aug 2007 14:48:33 -0300 Subject: [BioPython] blast output xml In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> References: <46AF51CA.2030005@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> Message-ID: On 8/1/07, Michiel De Hoon wrote: > As a note to the developers: > We should ask ourselves if distributing packages for various linuxes is > really a good idea. Installing Biopython isn't all that hard, and the > packages for linux tend to get behind as new Biopython versions come out, > causing confusion for our users. I think that packages should be distributed, but with a clear warning about the benefits of installing the last version. In some webhostings, webmasters won't let you install a tar.gz package, but they would allow you to install a package in Debian/Ubuntu repositories (for security reasons). Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From jodyhey at yahoo.com Wed Aug 1 15:56:54 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 1 Aug 2007 12:56:54 -0700 (PDT) Subject: [BioPython] accessing "data quality" Phrap records in Genbank Message-ID: <14649.20165.qm@web53907.mail.re2.yahoo.com> for some sequence records, NCBI has a a record of the Phrap scores corresponding to the sequence (i.e. one score for each base). These are typically records containing draft sequences from genome projects to see an example, try this link http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&qty=1&c_start=1&list_uids=153792835&uids=&dopt=qual&dispmax=5&sendto=&fmt_mask=0&from=begin&to=end&extrafeatpresent=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32&ef_STS=64&ef_tRNA=128&ef_microRNA=256&ef_Exon=512 How could I go about downloading these sequence quality scores? I need to filter the data by a certain score thanks jhey ____________________________________________________________________________________ Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/ From biopython at maubp.freeserve.co.uk Wed Aug 1 17:05:33 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 01 Aug 2007 22:05:33 +0100 Subject: [BioPython] accessing "data quality" Phrap records in Genbank In-Reply-To: <14649.20165.qm@web53907.mail.re2.yahoo.com> References: <14649.20165.qm@web53907.mail.re2.yahoo.com> Message-ID: <46B0F59D.7080804@maubp.freeserve.co.uk> Emanuel Hey wrote: > for some sequence records, NCBI has a a record of the > Phrap scores corresponding to the sequence (i.e. one > score for each base). > > These are typically records containing draft sequences > from genome projects > > to see an example, try this link > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&qty=1&c_start=1&list_uids=153792835&uids=&dopt=qual&dispmax=5&sendto=&fmt_mask=0&from=begin&to=end&extrafeatpresent=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32&ef_STS=64&ef_tRNA=128&ef_microRNA=256&ef_Exon=512 > > How could I go about downloading these sequence > quality scores? One option for getting the data would be to construct the URL then download it using standard python tools, e.g. the urllib.urlretrieve function. Alternatively Biopython has some NCBI/Entrez code you might be able to use... The second step is actually parsing the data file into a usable form. The "Base Quality" format looks very easy to parse, with a FASTA like header followed by space separated decimal scores. Their XML format also looks fairly simple - the core data looks like its held as a string where each two characters represents one score in hex. As far as I could see based on the URL you gave, none of the other format options actually contain the "data quality" information. I'm not aware of any code in Biopython to cope with either of these file formats. > I need to filter the data by a certain score Are you trying to select parts of the associated sequence? Peter From fkauff at biologie.uni-kl.de Thu Aug 2 03:54:24 2007 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Thu, 02 Aug 2007 09:54:24 +0200 Subject: [BioPython] accessing "data quality" Phrap records in Genbank In-Reply-To: <46B0F59D.7080804@maubp.freeserve.co.uk> References: <14649.20165.qm@web53907.mail.re2.yahoo.com> <46B0F59D.7080804@maubp.freeserve.co.uk> Message-ID: On Wed, 01 Aug 2007 22:05:33 +0100 Peter wrote: > Emanuel Hey wrote: >> for some sequence records, NCBI has a a record of the >> Phrap scores corresponding to the sequence (i.e. one >> score for each base). >> ...> > I'm not aware of any code in Biopython to cope with >either of these file > formats. > Ace.py and Phd.py in Bio/Sequencing are two parsers that parse the phd and ace files generated by phred and phrap Frank >> I need to filter the data by a certain score > > Are you trying to select parts of the associated >sequence? > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- J-Prof. Dr. Frank Kauff Molecular Phylogenetics FB Biologie, 13/276 TU Kaiserslautern Postfach 3049 67653 Kaiserslautern Tel. +49 (0)631 205-2562 Fax. +49 (0)631 205-2998 email: fkauff at biologie.uni-kl.de skype: frank.kauff From jodyhey at yahoo.com Thu Aug 2 21:20:38 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Thu, 2 Aug 2007 18:20:38 -0700 (PDT) Subject: [BioPython] accessing "data quality" Phrap records in Genbank In-Reply-To: <46B0F59D.7080804@maubp.freeserve.co.uk> Message-ID: <81290.42390.qm@web53907.mail.re2.yahoo.com> Thanks. urllib does the job quite well. --- Peter wrote: > Emanuel Hey wrote: > > for some sequence records, NCBI has a a record of > the > > Phrap scores corresponding to the sequence (i.e. > one > > score for each base). > > > > These are typically records containing draft > sequences > > from genome projects > > > > to see an example, try this link > > > > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&qty=1&c_start=1&list_uids=153792835&uids=&dopt=qual&dispmax=5&sendto=&fmt_mask=0&from=begin&to=end&extrafeatpresent=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32&ef_STS=64&ef_tRNA=128&ef_microRNA=256&ef_Exon=512 > > > > How could I go about downloading these sequence > > quality scores? > > One option for getting the data would be to > construct the URL then > download it using standard python tools, e.g. the > urllib.urlretrieve > function. Alternatively Biopython has some > NCBI/Entrez code you might be > able to use... > > The second step is actually parsing the data file > into a usable form. > The "Base Quality" format looks very easy to parse, > with a FASTA like > header followed by space separated decimal scores. > Their XML format > also looks fairly simple - the core data looks like > its held as a string > where each two characters represents one score in > hex. As far as I > could see based on the URL you gave, none of the > other format options > actually contain the "data quality" information. > > I'm not aware of any code in Biopython to cope with > either of these file > formats. > > > I need to filter the data by a certain score > > Are you trying to select parts of the associated > sequence? > > Peter > > ____________________________________________________________________________________ Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469 From letondal at pasteur.Fr Fri Aug 3 04:55:12 2007 From: letondal at pasteur.Fr (Catherine Letondal) Date: Fri, 3 Aug 2007 10:55:12 +0200 Subject: [BioPython] Course in informatics for biology 2008 at Institut Pasteur Message-ID: <575EDD34-F5C4-4DF4-97FF-CCC1E4219A01@pasteur.Fr> Hi, ************************************************************************ * Course in informatics for biology 2008 at Institut Pasteur http://www.pasteur.fr/formation/infobio-en.html ************************************************************************ * In the series of courses offered at the Pasteur Institute, a course will be offered in informatics in biology. The next session will take place from January to end of April 2008. The main goal of this course is to provide researchers in biology an initial exposure to informatics. Admitance in the course is reserved for those with a degree in biology or a related discipline. With more and more bioinformatics tools available, it becomes increasingly important for researchers in biology to be able both to manage their data, implement their ideas, and judge for themselves the usefulness of new algorithms and software. This course will emphasize fundamental aspects of computer science and apply them to biological examples. Theoretical aspects (algorithm development, logic, problem modeling and design methods), and technical applications (databases and web technologies) that are relevant for biologists will be thoroughly discussed. Programming is presented through the object-oriented paradigm, using a modern high-level language, Python, provided with tools for biology and enabling both prototyping or scripting and the building of important software systems. Learning of an additional language (C) will be available for interested students. Learning during the course will be reinforced with computing exercises, and effective training will be provided by a 2 month research project. The working language of the course is French. For further information, please consult: http://www.pasteur.fr/formation/infobio-en.html *** Registration will be closed on October 1 2007. *** Sincerely, -- Benno Schwikowski & Catherine Letondal Institut Pasteur -- Course in Informatics for Biology www.pasteur.fr/formation/infobio From sbassi at gmail.com Tue Aug 14 12:45:26 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 13:45:26 -0300 Subject: [BioPython] Inverse codon table? Message-ID: I am looking for a table in Biopython of amino-acids and its codons. I found in Data.CodonTable a table but it should be used in the inverse way I need: >>> from Bio.Data import CodonTable >>> t=CodonTable.standard_dna_table >>> t.forward_table["ATG"] 'M' I need a table that I coulld enter "M" and get the all triples that translate into "M". Is this available in Biopyhon? -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From sbassi at gmail.com Tue Aug 14 13:38:01 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 14:38:01 -0300 Subject: [BioPython] Inverse codon table? In-Reply-To: <46C1E3AB.1000100@maubp.freeserve.co.uk> References: <46C1E3AB.1000100@maubp.freeserve.co.uk> Message-ID: On 8/14/07, Peter wrote: > I'm not sure without a thorough check, but its fairly be trivial to > build such a dictionary from the CodonTable.standard_dna_table like this: ... Thnaks. That is nicer than my hand made back trans dictionary: tdict={"A":("GCT","GCC","GCA","GCG"),"R":("CGT","CGC","CGA","CGG","AGA","AGG"),\ "N":("AAT","ACC"),"D":("GAT","GAC"),"C":("TGT","TGC"),"Q":("CAA","CAG"),\ "E":("GAA","GAG"),"G":("GGT","GGC","GGA","GGG"),"H":("CAT","CAC"),\ "I":("ATT","ATC","ATA"),"L":("TTA","TTG","CTT","CTC","CTA","CTG"),\ "K":("AAA","AAG"),"M":("AUG"),"F":("TTT","TTC"),"P":\ ("CCT","CCC","CCA","CCG"),"S":("TCT","TCC","TCA","TCG","AGT","AGC"),\ "T":("ACT","ACC","ACA","ACG"),"W":("TGG"),"Y":("TAT","TAC"),\ "V":("GTT","GTC","GTA","GTG")} And yours is better since allows to construct a back dict using another translation table. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Tue Aug 14 13:17:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Aug 2007 18:17:31 +0100 Subject: [BioPython] Inverse codon table? In-Reply-To: References: Message-ID: <46C1E3AB.1000100@maubp.freeserve.co.uk> Sebastian Bassi wrote: > I am looking for a table in Biopython of amino-acids and its codons. I > found in Data.CodonTable a table but it should be used in the inverse > way I need: > >>>> from Bio.Data import CodonTable >>>> t=CodonTable.standard_dna_table >>>> t.forward_table["ATG"] > 'M' > > I need a table that I coulld enter "M" and get the all triples that > translate into "M". Is this available in Biopyhon? > I'm not sure without a thorough check, but its fairly be trivial to build such a dictionary from the CodonTable.standard_dna_table like this: from Bio.Data import CodonTable t=CodonTable.standard_dna_table t.forward_table["ATG"] #Build a back table dict where the values are lists of codons... bt = dict() for a1 in "ATCG" : for a2 in "ATCG" : for a3 in "ATCG" : codon = a1+a2+a3 try : amino = t.forward_table[codon] except KeyError : assert codon in t.stop_codons continue try : bt[amino].append(codon) except KeyError : bt[amino] = [codon] for amino in bt : print amino, bt[amino] Peter From p.pagel at gsf.de Tue Aug 14 14:08:18 2007 From: p.pagel at gsf.de (Philipp Pagel) Date: Tue, 14 Aug 2007 20:08:18 +0200 Subject: [BioPython] Inverse codon table? In-Reply-To: References: Message-ID: <20070814180817.GA3154@gsf.de> Hi! > I need a table that I coulld enter "M" and get the all triples that > translate into "M". Is this available in Biopyhon? Hm - I'm not sure but it's easy enough to make one yourself: >>> foo = {} >>> for codon, AA in t.forward_table.items(): ... if AA not in foo: ... foo[AA] = [] ... foo[AA].append(codon) ... >>> foo['L'] ['CTT', 'CTG', 'CTA', 'CTC', 'TTA', 'TTG'] >>> foo['V'] ['GTA', 'GTC', 'GTG', 'GTT'] cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Lehrstuhl f?r Genomorientierte Bioinformatik Fax. +49-8161-71 2186 Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany and Institut f?r Bioinformatik / MIPS GSF - Forschungszentrum f?r Umwelt und Gesundheit Ingolst?dter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel From sbassi at gmail.com Tue Aug 14 14:44:15 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 15:44:15 -0300 Subject: [BioPython] Inverse codon table? In-Reply-To: <20070814180817.GA3154@gsf.de> References: <20070814180817.GA3154@gsf.de> Message-ID: On 8/14/07, Philipp Pagel wrote: > >>> foo = {} > >>> for codon, AA in t.forward_table.items(): > ... if AA not in foo: > ... foo[AA] = [] > ... foo[AA].append(codon) That is also woks! My goal is to make a function that will return ALL possible backtranslated sequences from a given peptide. The back_translate function from Biopython just return one possible sequence. I want them all :) -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From sbassi at gmail.com Tue Aug 14 14:55:24 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 15:55:24 -0300 Subject: [BioPython] Inverse codon table? In-Reply-To: References: <20070814180817.GA3154@gsf.de> Message-ID: On 8/14/07, Sebastian Bassi wrote: > function from Biopython just return one possible sequence. I want them > all :) This returns all possible combinations: peptidoORI="MANCNGASK" tot=0 for x in peptidoORI: tot=tot+len(bt[x]) allpepts=[""]*tot for aa in peptidoORI: i=0 for x in range(tot): if i==len(bt[aa]): i=0 else: pass allpepts[x]=allpepts[x]+bt[aa][i] i=i+1 >>> allpepts ['ATGGCAAATTGTAATGGAGCAAGTAAA', 'ATGGCTAACTGCAACGGTGCTAGCAAG', 'ATGGCCAATTGTAATGGCGCCTCAAAA', 'ATGGCGAACTGCAACGGGGCGTCTAAG', 'ATGGCAAATTGTAATGGAGCATCCAAA', 'ATGGCTAACTGCAACGGTGCTTCGAAG', 'ATGGCCAATTGTAATGGCGCCAGTAAA', 'ATGGCGAACTGCAACGGGGCGAGCAAG', 'ATGGCAAATTGTAATGGAGCATCAAAA', 'ATGGCTAACTGCAACGGTGCTTCTAAG', 'ATGGCCAATTGTAATGGCGCCTCCAAA', 'ATGGCGAACTGCAACGGGGCGTCGAAG', 'ATGGCAAATTGTAATGGAGCAAGTAAA', 'ATGGCTAACTGCAACGGTGCTAGCAAG', 'ATGGCCAATTGTAATGGCGCCTCAAAA', 'ATGGCGAACTGCAACGGGGCGTCTAAG', 'ATGGCAAATTGTAATGGAGCATCCAAA', 'ATGGCTAACTGCAACGGTGCTTCGAAG', 'ATGGCCAATTGTAATGGCGCCAGTAAA', 'ATGGCGAACTGCAACGGGGCGAGCAAG', 'ATGGCAAATTGTAATGGAGCATCAAAA', 'ATGGCTAACTGCAACGGTGCTTCTAAG', 'ATGGCCAATTGTAATGGCGCCTCCAAA', 'ATGGCGAACTGCAACGGGGCGTCGAAG', 'ATGGCAAATTGTAATGGAGCAAGTAAA', 'ATGGCTAACTGCAACGGTGCTAGCAAG', 'ATGGCCAATTGTAATGGCGCCTCAAAA'] -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From Gabino.Sanchez-Perez at Dal.Ca Tue Aug 14 19:57:40 2007 From: Gabino.Sanchez-Perez at Dal.Ca (Gabino Sanchez-Perez) Date: Tue, 14 Aug 2007 20:57:40 -0300 Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML Message-ID: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> Hi I'm parsing a blastp output in XML format using NCBIXML, but my problem is that I don't know how to get the Hit length (not alignment length), even though is defined in the parser: def _end_Hit_len(self): self._hit.length = int(self._value) So in my script when I iterate for alignment in brecord.alignments: for hsp in alignment.hits: it seems that is not defined and the error message is that hit.length does not exist. Thanks in advance Gabino From mdehoon at c2b2.columbia.edu Tue Aug 14 20:45:26 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 14 Aug 2007 20:45:26 -0400 Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML References: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B604@mail2.exch.c2b2.columbia.edu> This should work: >>> for alignment in brecord.alignments: ... print alignment.length If it doesn't, please send us the exact code you are using, and the exact error message that you got. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Gabino Sanchez-Perez Sent: Tue 8/14/2007 7:57 PM To: biopython at lists.open-bio.org Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML Hi I'm parsing a blastp output in XML format using NCBIXML, but my problem is that I don't know how to get the Hit length (not alignment length), even though is defined in the parser: def _end_Hit_len(self): self._hit.length = int(self._value) So in my script when I iterate for alignment in brecord.alignments: for hsp in alignment.hits: it seems that is not defined and the error message is that hit.length does not exist. Thanks in advance Gabino _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From winter at biotec.tu-dresden.de Wed Aug 15 03:36:47 2007 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 15 Aug 2007 09:36:47 +0200 Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML In-Reply-To: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> References: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> Message-ID: <46C2AD0F.3050207@biotec.tu-dresden.de> Gabino Sanchez-Perez wrote: > Hi > > I'm parsing a blastp output in XML format using NCBIXML, but my problem is that > I don't know how to get the Hit length (not alignment length), even though is > defined in the parser: > > def _end_Hit_len(self): > self._hit.length = int(self._value) > > So in my script when I iterate > > for alignment in brecord.alignments: > for hsp in alignment.hits: > > it seems that is not defined and the error message is that hit.length does not > exist. I can confirm what Michiel wrote: alignment.length = total length of the unaligned hit sequence HTH, Christof From Gabino.Sanchez-Perez at Dal.Ca Wed Aug 15 08:23:26 2007 From: Gabino.Sanchez-Perez at Dal.Ca (Gabino Sanchez-Perez) Date: Wed, 15 Aug 2007 09:23:26 -0300 Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B604@mail2.exch.c2b2.columbia.edu> References: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> <6243BAA9F5E0D24DA41B27997D1FD14402B604@mail2.exch.c2b2.columbia.edu> Message-ID: <20070815092326.ah56e2ud8w00k4wk@my5.dal.ca> It works! I was confounded because I was looking for the length of the alignment (that now I see that is in hsp.sbjct) and I thought it was in alignment.length (Hit length). Thank you very much for the help Gabino > This should work: > >>>> for alignment in brecord.alignments: > ... print alignment.length > > If it doesn't, please send us the exact code you are using, and the exact > error message that you got. > > --Michiel. > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Gabino Sanchez-Perez > Sent: Tue 8/14/2007 7:57 PM > To: biopython at lists.open-bio.org > Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML > > Hi > > I'm parsing a blastp output in XML format using NCBIXML, but my problem is > that > I don't know how to get the Hit length (not alignment length), even though is > defined in the parser: > > def _end_Hit_len(self): > self._hit.length = int(self._value) > > So in my script when I iterate > > for alignment in brecord.alignments: > for hsp in alignment.hits: > > it seems that is not defined and the error message is that hit.length does > not > exist. > > Thanks in advance > > Gabino > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From douglas.kojetin at gmail.com Wed Aug 15 14:30:41 2007 From: douglas.kojetin at gmail.com (Douglas Kojetin) Date: Wed, 15 Aug 2007 14:30:41 -0400 Subject: [BioPython] Bio.PDB rotate/translate Message-ID: <152CE16F-7F6D-46F3-B285-59FBB3671EED@gmail.com> Hi All, I would like to rotate a specific H and N vector in my structure file, so that it points in a different direction, but keep its location fixed in 3D space. Here is what I've been working with, but it seems to move the atoms away from their original location. I grabbed most of this from the Bio.PDB FAQ (the PDF file) and from the Bio.PDB documentation online. I'm not sure if I should be using different values for the Vector/rotation line, or in the translation array, etc. ### atom1 = structure[model.id][chain.id][residue.id[1]]['H'] atom2 = structure[model.id][chain.id][residue.id[1]]['N'] rotation = Bio.PDB.rotaxis( math.pi/2.0, Bio.PDB.Vector(1,0,0) ) translation = Numeric.array( (1,0,0), 'f' ) atom1.transform(rotation, translation) atom2.transform(rotation, translation) ### Thanks in advance for the help, Doug From jdiezperezj at gmail.com Thu Aug 16 13:27:45 2007 From: jdiezperezj at gmail.com (=?ISO-8859-1?Q?Javier_D=EDez?=) Date: Thu, 16 Aug 2007 19:27:45 +0200 Subject: [BioPython] psi mi Message-ID: Hy, Does anyone knows a python parser for PSI-MI format? Thanks Javi From winter at biotec.tu-dresden.de Thu Aug 16 13:53:01 2007 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Thu, 16 Aug 2007 19:53:01 +0200 Subject: [BioPython] psi mi In-Reply-To: References: Message-ID: <46C48EFD.6080701@biotec.tu-dresden.de> Javier D?ez wrote: > Hy, > Does anyone knows a python parser for PSI-MI format? > Thanks > Javi Hi Javier: I searched for one some time ago, and since I didn't find one, I wrote my own. It's based on ElementTree: http://effbot.org/zone/element-index.htm http://docs.python.org/lib/module-xml.etree.ElementTree.html It only parses some basic data from the PSIMI file, if you want to have a look at it, just drop me a line. Cheers, Christof From meesters at uni-mainz.de Thu Aug 16 14:33:56 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 16 Aug 2007 20:33:56 +0200 Subject: [BioPython] Bio.PDB: possible to obtain chain identifier for atoms? Message-ID: <1187289236.19033.17.camel@cmeesters> Hi All, I'm trying to build a somewhat bigger app which needs to calculate the distance distribution function of a protein several times. For this purpose I re-discovered BioPython today. Speed is an issue, but since the protein in question isn't too big the overhead of Python and BioPython might be tolerable. Still, in order to speed up my code I would need to access the chain identifier of atoms within a loop. To illustrate my question some pseudo-code: atoms = Selection.unfold_entities(structure, 'A') for i, a in enumerate(atoms): for b in atoms[i+1:]: # do some calculations here ... Now, I cannot avoid the double loop, but if I could check somehow whether atoms a and b belong to the same chain (and which) I would have to do this awkward looping only once this way. Does anybody know an approach? As far as I understand the code of the Atom class, there is no direct way, right? TIA Christian From meesters at uni-mainz.de Thu Aug 16 14:50:15 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 16 Aug 2007 20:50:15 +0200 Subject: [BioPython] Bio.PDB rotate/translate In-Reply-To: <152CE16F-7F6D-46F3-B285-59FBB3671EED@gmail.com> References: <152CE16F-7F6D-46F3-B285-59FBB3671EED@gmail.com> Message-ID: <1187290215.19033.34.camel@cmeesters> Hi, As you can see from my previous mail, I only just started to work with BioPython today, again. Still, I dare to ask back a few question. Actually, I don't understand your question: > I would like to rotate a specific H and N vector in my structure > file Guess, this is a mistake and you don't want to rotate the vector itself. I presume your structure looks like this: H N \/ c1 | c2 right? And you want to turn around the axis defined by c1-c2? Then, if I understand the 'PDB FAQ' correctly the code before the rotaxis line should read: c1 = residue[c1-identifier].get_vector() c2 = residue[c2-identifier].get_vector() rotation = rotaxis(angle, c1-c2) Of course, your residue won't look like my sketch ... > > rotation = Bio.PDB.rotaxis( math.pi/2.0, Bio.PDB.Vector(1,0,0) ) > translation = Numeric.array( (1,0,0), 'f' ) Here, you would shift the atoms by 1 ? along 'x'. Applied on only two atoms, bonds would be stretched. BTW, did you save the altered structure after applying the rotation/translation? If this was all wrong, please excuse a newbie answering this question. But the mail was left unanswered for a while and it is still possible to correct me ;-). HTH Christian From meesters at uni-mainz.de Fri Aug 17 03:15:39 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Fri, 17 Aug 2007 09:15:39 +0200 Subject: [BioPython] Bio.PDB: possible to obtain chain identifier for atoms? In-Reply-To: <8e76d5310708161217s7e59e2ado6b4b514f6192c38e@mail.gmail.com> References: <1187289236.19033.17.camel@cmeesters> <8e76d5310708161217s7e59e2ado6b4b514f6192c38e@mail.gmail.com> Message-ID: <1187334939.26144.1.camel@cmeesters> Hoi Katie, apparently, yes. Life can be so easy ... Thanks, Christian On Thu, 2007-08-16 at 15:17 -0400, Katie Edmonds wrote: > Hi Christian, > > are you looking for: > chain1 = a.get_parent().get_parent() > ? > > Katie > > On 8/16/07, Christian Meesters wrote: > Hi All, > > I'm trying to build a somewhat bigger app which needs to > calculate the > distance distribution function of a protein several times. For > this > purpose I re-discovered BioPython today. > Speed is an issue, but since the protein in question isn't too > big the > overhead of Python and BioPython might be tolerable. > > Still, in order to speed up my code I would need to access the > chain > identifier of atoms within a loop. To illustrate my question > some > pseudo-code: > > atoms = Selection.unfold_entities(structure, 'A') > for i, a in enumerate(atoms): > for b in atoms[i+1:]: > # do some calculations here ... > > Now, I cannot avoid the double loop, but if I could check > somehow > whether atoms a and b belong to the same chain (and which) I > would have > to do this awkward looping only once this way. > > Does anybody know an approach? As far as I understand the code > of the > Atom class, there is no direct way, right? > > TIA > Christian > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Michael.Robeson at Colorado.EDU Sat Aug 18 12:42:22 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Sat, 18 Aug 2007 10:42:22 -0600 Subject: [BioPython] qblast Message-ID: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> I have been using qblast (blastn) for a set of sequences. However, it gives me _completely_ different results than if I was to use the web interface at NCBI (also using blastn, nr). Is anyone else having this problem? Am I missing something? Can anyone help with making sure that I am using the same settings in qblast as with the NCBI web interface? Also, are there any plans for implementing megablast? -Thanks! -Mike From biopython at maubp.freeserve.co.uk Sat Aug 18 15:25:48 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Aug 2007 20:25:48 +0100 Subject: [BioPython] qblast In-Reply-To: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> Message-ID: <46C747BC.6010002@maubp.freeserve.co.uk> Michael S. Robeson wrote: > I have been using qblast (blastn) for a set of sequences. However, it > gives me _completely_ different results than if I was to use the web > interface at NCBI (also using blastn, nr). Is anyone else having this > problem? Am I missing something? Could you give us an example? Say three of sequences with details of how you use the web interface, and the code you are using in Biopython. Peter From Michael.Robeson at Colorado.EDU Sat Aug 18 16:51:24 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Sat, 18 Aug 2007 14:51:24 -0600 Subject: [BioPython] qblast In-Reply-To: <46C747BC.6010002@maubp.freeserve.co.uk> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> Message-ID: Sure, I am using a really simple script that I wrote (link below) to start getting familiar w/ biopython. So, any other comments would be greatly appreciated. http://ucsu.colorado.edu/~robeson/documents/blast.txt Using this sequence: TGTGATGGATATCTGCAGAATTCGCCCTTTAAACTTCAGGGTGACCAAAA AATCAAAATAAATGTTGAAATAATACTGGATCTCCACCACCACTAACTTC AAAAAATGTTGTATTAAAATTTCTATCAGTTAATAACATTGTTATAGCAC CCCCTAATACTGGTAATGATAATAATAATAATCATGCTGTTATAAATACA GCTCAAACAAATAAAGGTAACTTAAACATACTCATACCAGGTGTTCGCAT ATTAATAACAGTAACAATAAAATTTATTGAACCTAATATTGATGATATAC CAGCTAAATGTAAACTAAATATTGCACATTCTATTGAACCTCCTGAATGT GAAAATATACCAGATAATGGTGGATAAACAGTTCAACCTGTACCTGCCCC CATCTCGACTACAGATGATCAAATTAATAAAAAAAATGATGGTACTAATA ATCAAAAACTTATATTATTTAATCTTGGGAATGCCATATCAGGAGCTCCT ATCATTAAAGGTAAAAATCAATTACCAAAACCACCCATTAATGCAGGCAT AACCATAAAAAATATCATTATTAAAGCATGTGCTGTTATTAACACATTAT ATGCTTGATGATTGTAATTTAATATTACTGCACCAGCATCTGATAATTCT ATACGTATTAATATAGATCAAAATGTTCCTATTAAACCTGCTAAAAATGC AAATATTAAATATAATGTTCCAATATCTTTATGATTTGTTGACCAAGGGC GAATTCCAGCACACTGGCGGCCGTTACTAG Using blastn. When I use the web site I set the following: Database = Others (nr etc..) w/ drop menu set to 'Nucleotide collection (nr/nt)' Optimize for = 'Somewhat similar sequences (blastn)' I assumed the above was the same as biopython's default qblast when using blastn. Basically, my script seemed worked fine until NCBI changed the layout of their site (I'd check every so often with a set of samples). I also tried to change my code with some of the new biopython code. But it is very hard to find good up-to-date tutorials. Last time I looked none of them worked. Anyway, no matter what settings I use via the NCBI web interface I cannot get the same results as I do with qblast. So, unless I am missing something I wonder if there is something wrong with the back- end? -Mike On Aug 18, 2007, at 1:25 PM, Peter wrote: > Michael S. Robeson wrote: >> I have been using qblast (blastn) for a set of sequences. However, >> it gives me _completely_ different results than if I was to use >> the web interface at NCBI (also using blastn, nr). Is anyone else >> having this problem? Am I missing something? > > Could you give us an example? Say three of sequences with details > of how you use the web interface, and the code you are using in > Biopython. > > Peter > ---------------------------------------------- Michael S. Robeson II Ecology and Evolutionary Biology N122 Ramaley Hall Campus Box 334 University of Colorado Boulder, CO 80309 robeson at colorado.edu http://ucsu.colorado.edu/~robeson/ From biopython at maubp.freeserve.co.uk Sat Aug 18 17:55:08 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Aug 2007 22:55:08 +0100 Subject: [BioPython] qblast In-Reply-To: References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> Message-ID: <46C76ABC.9040608@maubp.freeserve.co.uk> Michael S. Robeson wrote: > Sure, I am using a really simple script that I wrote (link below) to > start getting familiar w/ biopython. So, any other comments would be > greatly appreciated. > > http://ucsu.colorado.edu/~robeson/documents/blast.txt Assuming you are using Biopython 1.43, then I would have suggested using Bio.SeqIO rather than Bio.Fasta for reading/writing FASTA files. See the tutorial or http://www.biopython.org/wiki/SeqIO I'm also unclear why you want to save individual output blast HTML pages (one file per sequence). Trying to parse these HTML files later will be a pain - go from the XML output if you want to read the blast output with Biopython. > I assumed the above was the same as biopython's default qblast when > using blastn. Basically, my script seemed worked fine until NCBI > changed the layout of their site (I'd check every so often with a set > of samples). See my other email - they should be the same if all the parameters are the same (specifically it looks like the NCBI seems to be using different defaults for the gap penalties, which was very confusing). > I also tried to change my code with some of the new > biopython code. But it is very hard to find good up-to-date > tutorials. Last time I looked none of them worked. Can you point out any out of date documentation please? As far as I know, the BLAST examples in our tutorial work fine: http://biopython.org/DIST/docs/tutorial/Tutorial.html Peter From biopython at maubp.freeserve.co.uk Sat Aug 18 17:38:52 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Aug 2007 22:38:52 +0100 Subject: [BioPython] qblast In-Reply-To: References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> Message-ID: <46C766EC.6030601@maubp.freeserve.co.uk> Hi Michael, Here is a short script based on your example, which I have tested for calling qblast: from Bio.Blast.NCBIWWW import qblast seq_string = "TGTGATGGATATCTGCAGAATTCGCCCTTTAAACTTCAGGGTGACCAAAA" \ + "AATCAAAATAAATGTTGAAATAATACTGGATCTCCACCACCACTAACTTC" \ + "AAAAAATGTTGTATTAAAATTTCTATCAGTTAATAACATTGTTATAGCAC" \ + "CCCCTAATACTGGTAATGATAATAATAATAATCATGCTGTTATAAATACA" \ + "GCTCAAACAAATAAAGGTAACTTAAACATACTCATACCAGGTGTTCGCAT" \ + "ATTAATAACAGTAACAATAAAATTTATTGAACCTAATATTGATGATATAC" \ + "CAGCTAAATGTAAACTAAATATTGCACATTCTATTGAACCTCCTGAATGT" \ + "GAAAATATACCAGATAATGGTGGATAAACAGTTCAACCTGTACCTGCCCC" \ + "CATCTCGACTACAGATGATCAAATTAATAAAAAAAATGATGGTACTAATA" \ + "ATCAAAAACTTATATTATTTAATCTTGGGAATGCCATATCAGGAGCTCCT" \ + "ATCATTAAAGGTAAAAATCAATTACCAAAACCACCCATTAATGCAGGCAT" \ + "AACCATAAAAAATATCATTATTAAAGCATGTGCTGTTATTAACACATTAT" \ + "ATGCTTGATGATTGTAATTTAATATTACTGCACCAGCATCTGATAATTCT" \ + "ATACGTATTAATATAGATCAAAATGTTCCTATTAAACCTGCTAAAAATGC" \ + "AAATATTAAATATAATGTTCCAATATCTTTATGATTTGTTGACCAAGGGC" \ + "GAATTCCAGCACACTGGCGGCCGTTACTAG" #result_handle = qblast('blastn', 'nr', seq_string, format_type='HTML') #output_handle = open("test.html", "w") #output_handle.write(result_handle.read()) #output_handle.close() result_handle = qblast('blastn', 'nr', seq_string, format_type='Text') output_handle = open("test.txt", "w") output_handle.write(result_handle.read()) output_handle.close() #result_handle = qblast('blastn', 'nr', seq_string, format_type='XML') #output_handle = open("test.xml", "w") #output_handle.write(result_handle.read()) #output_handle.close() print "Done" The top hits from the script were: gb|AY916130.1| Epidermophyton floccosum mitochondrion, complete gb|EF180206.1| Penicillium confertum voucher 171.87 cytochrom... gb|EF180399.1| Penicillium soppii voucher IBT 14908 cytochrom... gb|EF180398.1| Penicillium soppii voucher IBT 3331 cytochrome... gb|EF180397.1| Penicillium soppii voucher IBT 18220 cytochrom... gb|EF180396.1| Penicillium soppii voucher 226.28 cytochrome o... gb|EF180395.1| Penicillium soppii voucher 144.83 cytochrome o... The top hits for me using online nblast for the same sequence also on the nr database: gb|AY129164.1| Pythium aphanidermatum cytochrome oxidase I ge... gb|AY561976.1| Scopalina ruetzleri cytochrome oxidase subunit... gb|EF468468.1| Phytophthora sp. H-6/02 cytochrome oxidase sub... gb|DQ832717.1| Phytophthora sojae mitochondrion, complete genome gb|EF468470.1| Phytophthora sp. H-8/02 cytochrome oxidase sub... gb|EF468469.1| Phytophthora sp. H-7/02 cytochrome oxidase sub... gb|AY129166.1| Phytophthora capsici cytochrome oxidase I gene... i.e. Very different! I switched to using plain text output as its easier to read by hand. Both correctly understood the input query was 780 letters long. Both claimed to be output from BLASTN 2.2.17 Both claimed to be output from the same database There where some differences in the parameters footer - but I'm not sure why. Using the script: Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Aug 16, 2007 6:06 PM Number of letters in database: -51,729,944 Number of sequences in database: 5,751,035 Lambda K H 1.37 0.711 1.31 Gapped Lambda K H 1.37 0.711 1.31 Matrix: blastn matrix:1 -3 Gap Penalties: Existence: 5, Extension: 2 ... While using the web browser: Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Aug 16, 2007 6:06 PM Number of letters in database: -51,729,944 Number of sequences in database: 5,751,035 Lambda K H 0.634 0.408 0.912 Gapped Lambda K H 0.634 0.408 0.912 Matrix: blastn matrix:2 -3 Gap Penalties: Existence: 5, Extension: 2 ... There is something funny here... does this throw any light on things? Peter From biopython at maubp.freeserve.co.uk Sat Aug 18 17:47:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Aug 2007 22:47:31 +0100 Subject: [BioPython] qblast In-Reply-To: <46C766EC.6030601@maubp.freeserve.co.uk> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> <46C766EC.6030601@maubp.freeserve.co.uk> Message-ID: <46C768F3.7020608@maubp.freeserve.co.uk> Peter wrote: > I switched to using plain text output as its easier to read by hand. > > Both correctly understood the input query was 780 letters long. > Both claimed to be output from BLASTN 2.2.17 > Both claimed to be output from the same database > > There where some differences in the parameters footer - but I'm not sure > why. Using the script: > > Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, > GSS,environmental > samples or phase 0, 1 or 2 HTGS sequences) > Posted date: Aug 16, 2007 6:06 PM > Number of letters in database: -51,729,944 > Number of sequences in database: 5,751,035 > Lambda K H > 1.37 0.711 1.31 > Gapped > Lambda K H > 1.37 0.711 1.31 > Matrix: blastn matrix:1 -3 > Gap Penalties: Existence: 5, Extension: 2 > ... > > While using the web browser: > > Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, > GSS,environmental > samples or phase 0, 1 or 2 HTGS sequences) > Posted date: Aug 16, 2007 6:06 PM > Number of letters in database: -51,729,944 > Number of sequences in database: 5,751,035 > Lambda K H > 0.634 0.408 0.912 > Gapped > Lambda K H > 0.634 0.408 0.912 > Matrix: blastn matrix:2 -3 > Gap Penalties: Existence: 5, Extension: 2 > ... Solved: Using qblast in a script defaulted to blastn matrix:1 -3, while the new webpage instead defaults to blastn matrix:2 -3 You can check this (and change it) on the webpage by clicking on "Algorithm parameters" and under "Scoring Parameters" changing "Match/Mismatch Scores" from "2, -3" to "1, -3". Then the results seem to agree. Does that work for you? Peter From Michael.Robeson at Colorado.EDU Sat Aug 18 20:13:10 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Sat, 18 Aug 2007 18:13:10 -0600 Subject: [BioPython] qblast In-Reply-To: <46C76ABC.9040608@maubp.freeserve.co.uk> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> <46C76ABC.9040608@maubp.freeserve.co.uk> Message-ID: > > Assuming you are using Biopython 1.43, then I would have suggested > using Bio.SeqIO rather than Bio.Fasta for reading/writing FASTA > files. See the tutorial or http://www.biopython.org/wiki/SeqIO Okay will do! > I'm also unclear why you want to save individual output blast HTML > pages (one file per sequence). Trying to parse these HTML files > later will be a pain - go from the XML output if you want to read > the blast output with Biopython. Actually I have another script which actually parses the blast and genbank data but this was one of the first scripts I wrote to make sure I can to the 'simple' things first. But I essectially use the same code to run blast. > > See my other email - they should be the same if all the parameters > are the same (specifically it looks like the NCBI seems to be using > different defaults for the gap penalties, which was very confusing). Yes it is. I must have thought I changed those parameters when in fact I didn't. Thanks for checking on this! > > I also tried to change my code with some of the new >> biopython code. But it is very hard to find good up-to-date >> tutorials. Last time I looked none of them worked. > > Can you point out any out of date documentation please? > As far as I know, the BLAST examples in our tutorial work fine: > http://biopython.org/DIST/docs/tutorial/Tutorial.html Okay, I must have been looking in the wrong places then. I'll go through the listed tutorials again. Is there a particular one you'd suggest? I have only looked at a few of the 'Online course notes' which is probably why. I just noticed the tutorial cookbook links. I must be blind! And I am using the latest biopython (1.43) package as installed via fink for OS X. Again, thanks! I'll play around with this more over the next day or so. I'll see if I can get my code to switch settings and see how the results are affected. -Mike From biopython at maubp.freeserve.co.uk Sun Aug 19 08:36:42 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 19 Aug 2007 13:36:42 +0100 Subject: [BioPython] qblast + documentation In-Reply-To: References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> <46C76ABC.9040608@maubp.freeserve.co.uk> Message-ID: <46C8395A.3040301@maubp.freeserve.co.uk> Michael S. Robeson wrote: >>> I also tried to change my code with some of the new >>> biopython code. But it is very hard to find good up-to-date >>> tutorials. Last time I looked none of them worked. >> >> Can you point out any out of date documentation please? >> As far as I know, the BLAST examples in our tutorial work fine: >> http://biopython.org/DIST/docs/tutorial/Tutorial.html > > Okay, I must have been looking in the wrong places then. I'll go > through the listed tutorials again. Is there a particular one you'd > suggest? The main tutorial is probably best, and most up to date. http://biopython.org/DIST/docs/tutorial/Tutorial.html If you are interested in RPS-BLAST, then have a look here too: http://www.warwick.ac.uk/go/peter_cock/python/rpsblast/ > I have only looked at a few of the 'Online course notes' > which is probably why. I just noticed the tutorial cookbook links. I > must be blind! There have been changes in how we handle parsing Blast output in recent releases of Biopython (driven by changes the NCBI made). We are are now recommending the use of XML output, but the Biopython code used to do this had to change a little compared to older releases. This may mean that any "third party" tutorials are a little out of date now. Peter From meesters at uni-mainz.de Mon Aug 20 11:15:39 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 20 Aug 2007 17:15:39 +0200 Subject: [BioPython] deleting and inserting 'chains' in Bio.PDB Message-ID: <1187622939.19858.18.camel@cmeesters> Hi, I'd like delete and insert subunits in my structure object. But the following code does not work: >>> parser = PDBParser() >>> s = parser.get_structure('name','name.pdb') >>> cs = [chain for chain in s.get_chains()] >>> cs [, , , , , ] >>> c = cs[0] >>> c >>> c.id 'A' >>> c.get_id() 'A' >>> s.detach_child('A') Traceback (most recent call last): File "", line 1, in File "/var/lib/python-support/python2.5/Bio/PDB/Entity.py", line 70, in detach_child child=self.child_dict[id] KeyError: 'A' >>> s.detach_child("") Traceback (most recent call last): File "", line 1, in File "/var/lib/python-support/python2.5/Bio/PDB/Entity.py", line 70, in detach_child child=self.child_dict[id] KeyError: '' Can somebody give me a hint of what I'm missing? Thanks, Christian From thamelry at binf.ku.dk Mon Aug 20 11:51:51 2007 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Mon, 20 Aug 2007 17:51:51 +0200 Subject: [BioPython] deleting and inserting 'chains' in Bio.PDB In-Reply-To: <1187622939.19858.18.camel@cmeesters> References: <1187622939.19858.18.camel@cmeesters> Message-ID: <2d7c25310708200851v5732e2adpf1fdc49baed0cc09@mail.gmail.com> On 8/20/07, Christian Meesters wrote: > Hi, > > I'd like delete and insert subunits in my structure object. But the > following code does not work: A chain object should be removed at the level of the model object, not at the level of the structure object. Remember: Structure<-Model<-Chain<-Residue<-Atom Cheers, -Thomas From meesters at uni-mainz.de Mon Aug 20 12:01:08 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 20 Aug 2007 18:01:08 +0200 Subject: [BioPython] deleting and inserting 'chains' in Bio.PDB In-Reply-To: <2d7c25310708200851v5732e2adpf1fdc49baed0cc09@mail.gmail.com> References: <1187622939.19858.18.camel@cmeesters> <2d7c25310708200851v5732e2adpf1fdc49baed0cc09@mail.gmail.com> Message-ID: <1187625668.19858.21.camel@cmeesters> Thanks! And thank you, Thomas, for your wonderful module! The module is actually easy to use, despite of my questions ... ;-) Cheers, Christian On Mon, 2007-08-20 at 17:51 +0200, Thomas Hamelryck wrote: > On 8/20/07, Christian Meesters wrote: > > Hi, > > > > I'd like delete and insert subunits in my structure object. But the > > following code does not work: > > A chain object should be removed at the level of the model object, not > at the level of the structure object. > > Remember: > > Structure<-Model<-Chain<-Residue<-Atom > > Cheers, > > -Thomas From jodyhey at yahoo.com Mon Aug 20 13:48:36 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Mon, 20 Aug 2007 10:48:36 -0700 (PDT) Subject: [BioPython] Blast to genome specific databases? In-Reply-To: <46C768F3.7020608@maubp.freeserve.co.uk> Message-ID: <897102.52773.qm@web53903.mail.re2.yahoo.com> NCBI has a blast form for genome specific datbases that returns more information than does a blast to the generic nucleotide database http://www.ncbi.nlm.nih.gov/BLAST/ See e.g. for the human genome http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=9606 I've used NCBIWWW.qblast and used biopython for stand alone blasting, but otherwise am not that experience. Is there a way to script a search that generates the same results as provided by these newere genome specific interfaces? Thanks Jhey ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From cjfields at uiuc.edu Mon Aug 20 15:17:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 20 Aug 2007 14:17:48 -0500 Subject: [BioPython] Blast to genome specific databases? In-Reply-To: <897102.52773.qm@web53903.mail.re2.yahoo.com> References: <897102.52773.qm@web53903.mail.re2.yahoo.com> Message-ID: If you follow the directions on this page: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html you should be able to modify the URLAPI DATABASE parameter to that in the list; another option would be to limit the results to a specific taxid (like 9606) using the ENTREZ_QUERY. Not sure how that would be done for biopython (being a bioperler myself) but I'm sure someone could chip in? chris On Aug 20, 2007, at 12:48 PM, Emanuel Hey wrote: > NCBI has a blast form for genome specific datbases > that returns more information than does a blast to the > generic nucleotide database > > http://www.ncbi.nlm.nih.gov/BLAST/ > > See e.g. for the human genome > > http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi? > taxid=9606 > > > I've used NCBIWWW.qblast and used biopython for stand > alone blasting, but otherwise am not that experience. > > > Is there a way to script a search that generates the > same results as provided by these newere genome > specific interfaces? > > Thanks > > Jhey > > > > > ______________________________________________________________________ > ______________ > Got a little couch potato? > Check out fun summer activities for kids. > http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities > +for+kids&cs=bz > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From biopython at maubp.freeserve.co.uk Wed Aug 22 11:05:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Aug 2007 16:05:31 +0100 Subject: [BioPython] Making the Seq object act more like a string Message-ID: <46CC50BB.1090902@maubp.freeserve.co.uk> Dear Biopython users, A couple of times (on bugs or the developers mailing list), Michiel de Hoon has previously suggested we could make the Seq class (Bio.Seq.Seq) a subclass of python string. I agree with him - the Seq object should act more like a string. I'm posting on the main discussion list to get some feedback on this, as any sudden changes could affect lots of people. As a simple example, although there are functions in Biopython to calculate GC percentages, for a beginner playing with Seq objects, it would be nice to be able to do things like this (where s is a Seq object): print float(s.count("G") + s.count("C")) / len(s) rather for example this: print float(s.tostring().count("G") + s.tostring().count("C")) / len(s) I also don't think the current behaviour of str(seq) is helpful. To recap, here is a simple example using a simple string: >>> ss = 'ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA' >>> print repr(ss) 'ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA' >>> print str(ss) ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA >>> print ss ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA And the equivalent using a Seq object as of Biopython 1.43: >>> s = Seq("ACAGGTACGATCGATCCTGTTGTACGTGCTC"\ ... +"GTCGACTGCTAGCTCGTCGTGGTCCGATGA") >>> print repr(s) Seq('ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA', Alphabet()) >>> print s.tostring() ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA >>> print str(s) Seq('ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATG ...', Alphabet()) >>> print s Seq('ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATG ...', Alphabet()) Note that currently doing str() on a Seq object gives a truncated version of what repr() gives. The only nice things about this is when working at the python command line, doing "print s" will only take one line even when working with a genome. And it makes it really clear you don't just have a string object. That was the motivation/background part. Feel free to chime in :) ----------------------------------------------------------------------- This next bit of the email gets a bit more technical... and might be better off on the developers mailing list. We'll see where any discussion goes. If we can agree to that making Seq inherit from the basic string is a good idea, then I would advocate a gradual transition... my thoughts are that for the next release of Biopython we: (1) Modify Seq .__str__() method to act like the existing .tostring(), i.e. return self.data I don't think changing the __str__ method will break any serious code, because as shown above, currently its like a truncated version of __repr__ so all its useful for at the moment is getting a truncated sequence for display. (2) Consider adding alphabet aware versions selected string methods to the Seq object (e.g. count, find) Adding new methods to the Seq class should have no effect on existing usage. Then, for the release afterwards: (3) actually do the class inheritance with all the horrors entailed. As part of this, we'll need to address how the __eq__ method of the Seq object should act: Looking at the sequence only, or considering the alphabet too? Currently this method is not implemented at all. This is part of a larger question - how to cope with multiple Seq/string operations where there is more than one alphabet. e.g. comparing/adding/joining a nucleotide Seq to a protein Seq object. I would opt for the simple solution that the alphabets must match or some sort of ValueError is raised. Alternatively, as the alphabets have a class hierarchy, we could choose the parent alphabet (e.g. the generic single letter alphabet when dealing with a DNA and Protein; or the generic single nucleotide alphabet when dealing with RNA and DNA). Any thoughts? Peter [In fact, Michiel has also suggested making the SeqRecord class a subclass of the Seq class, which raises even more questions] From sbassi at gmail.com Wed Aug 22 11:41:53 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 22 Aug 2007 12:41:53 -0300 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46CC50BB.1090902@maubp.freeserve.co.uk> References: <46CC50BB.1090902@maubp.freeserve.co.uk> Message-ID: On 8/22/07, Peter wrote: > A couple of times (on bugs or the developers mailing list), Michiel de > Hoon has previously suggested we could make the Seq class (Bio.Seq.Seq) > a subclass of python string. I agree with him - the Seq object should > act more like a string. I agree. Seq acting more like a str would also lower the entry level to use biopython for non OOP seasoned programmers. > As a simple example, although there are functions in Biopython to ... Here is another example (from http://www.biopython.org/wiki/SeqIO#Using_the_SEGUID_checksum): Current situation: from Bio import SeqIO from Bio.SeqUtils.CheckSum import seguid seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"), lambda rec : seguid(rec.seq)) record = seguid_dict["MN/s0q9zDoCVEEc+k/IFwCNF2pY"] print record.id print record.description If seq were more like a string: ... seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"), seguid(rec.seq)) .... This way you avoid using a lambda. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Wed Aug 22 11:53:59 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Aug 2007 16:53:59 +0100 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: References: <46CC50BB.1090902@maubp.freeserve.co.uk> Message-ID: <46CC5C17.4000709@maubp.freeserve.co.uk> Sebastian Bassi wrote: > On 8/22/07, Peter wrote: >> A couple of times (on bugs or the developers mailing list), Michiel de >> Hoon has previously suggested we could make the Seq class (Bio.Seq.Seq) >> a subclass of python string. I agree with him - the Seq object should >> act more like a string. > > I agree. Seq acting more like a str would also lower the entry level > to use biopython for non OOP seasoned programmers. Good :) >> As a simple example, although there are functions in Biopython to > ... > > Here is another example (from > http://www.biopython.org/wiki/SeqIO#Using_the_SEGUID_checksum ): I added that to the wiki recently - although it is perhaps premature given your CheckSum code hasn't been officially release yet. This was my draft, moving/adding it to the tutorial is on my to do list. > Current situation: > > from Bio import SeqIO > from Bio.SeqUtils.CheckSum import seguid > seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"), > lambda rec : seguid(rec.seq)) > record = seguid_dict["MN/s0q9zDoCVEEc+k/IFwCNF2pY"] > print record.id > print record.description > > If seq were more like a string: > > ... > seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"), > seguid(rec.seq)) Nope ;) You have to give a function to the key_function argument in SeqIO.to_dict(), and in your example seguid(rec.seq) would be a string (the result of the seguid function acting on a seq object). Or at least, it would if you had a rec variable in scope. However, if SeqRecord acted more like a Seq (and therefore more like a string) then you could do this which does avoid the lambda: seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), \ "genbank"), seguid) Or, we could enhance your the CheckSum functions to cope with a SeqRecord, a Seq or a string - right now they cope with a Seq or a string. Peter From mdehoon at c2b2.columbia.edu Thu Aug 23 02:06:27 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 23 Aug 2007 02:06:27 -0400 Subject: [BioPython] Making the Seq object act more like a string References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> Peter wrote: > However, if SeqRecord acted more like a Seq (and therefore more like a > string) then you could do this which does avoid the lambda: > > seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), \ > "genbank"), seguid) > > Or, we could enhance your the CheckSum functions to cope with a > SeqRecord, a Seq or a string - right now they cope with a Seq or a string. > Nice example. The SeqRecord class is one of those classes in Biopython for which I never understood why they exist. A SeqRecord is nothing more than a Seq with some attributes attached. If we add those attributes to the Seq class directly, then we can get rid of the SeqRecord class. Then, functions such as those in CheckSum only need to cope with strings. --Micheil. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Aug 23 02:06:27 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 23 Aug 2007 02:06:27 -0400 Subject: [BioPython] Making the Seq object act more like a string References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> Peter wrote: > However, if SeqRecord acted more like a Seq (and therefore more like a > string) then you could do this which does avoid the lambda: > > seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), \ > "genbank"), seguid) > > Or, we could enhance your the CheckSum functions to cope with a > SeqRecord, a Seq or a string - right now they cope with a Seq or a string. > Nice example. The SeqRecord class is one of those classes in Biopython for which I never understood why they exist. A SeqRecord is nothing more than a Seq with some attributes attached. If we add those attributes to the Seq class directly, then we can get rid of the SeqRecord class. Then, functions such as those in CheckSum only need to cope with strings. --Micheil. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3293 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/biopython/attachments/20070823/b40defae/attachment.bin From Michael.Robeson at Colorado.EDU Thu Aug 23 16:13:17 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Thu, 23 Aug 2007 14:13:17 -0600 Subject: [BioPython] qblast + documentation In-Reply-To: <46C8395A.3040301@maubp.freeserve.co.uk> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> <46C76ABC.9040608@maubp.freeserve.co.uk> <46C8395A.3040301@maubp.freeserve.co.uk> Message-ID: <2EEE2FCD-6A73-408D-9984-D3806D717BD1@colorado.edu> Why is there no way to add other parameters like gap penalties to qblast? Like can be done on the web site? Am I missing something? Is there code I can add? -Thanks -Mike From Michael.Robeson at Colorado.EDU Thu Aug 23 16:20:26 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Thu, 23 Aug 2007 14:20:26 -0600 Subject: [BioPython] qblast + documentation Message-ID: I meant to add to the other e-mail: Since we know it has to do with the blast setting differences on the NCBI web site versus the biopython back-end. How can I get around that? I've tried a few things, but have not been successful. -Mike From mdehoon at c2b2.columbia.edu Thu Aug 23 19:01:17 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Fri, 24 Aug 2007 08:01:17 +0900 Subject: [BioPython] qblast + documentation In-Reply-To: References: Message-ID: <46CE11BD.2020307@c2b2.columbia.edu> Have you looked at the qblast function definition in http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIWWW.py?rev=1.46&cvsroot=biopython&content-type=text/vnd.viewcvs-markup I believe all the Blast parameters are there. --Michiel. Michael S. Robeson wrote: > I meant to add to the other e-mail: Since we know it has to do with > the blast setting differences on the NCBI web site versus the > biopython back-end. How can I get around that? I've tried a few > things, but have not been successful. > > -Mike > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From Michael.Robeson at Colorado.EDU Thu Aug 23 21:12:30 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Thu, 23 Aug 2007 19:12:30 -0600 Subject: [BioPython] qblast + documentation In-Reply-To: <46CDFFE8.2040900@maubp.freeserve.co.uk> References: <46CDFFE8.2040900@maubp.freeserve.co.uk> Message-ID: <8963513B-811D-458B-BAB8-1BE64D3CA76C@colorado.edu> > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/ > Blast/NCBIWWW.py?rev=1.46&cvsroot=biopython&content-type=text/ > vnd.viewcvs-markup Are those new parameters? They do exist in the code I downloaded? Guess I will download again from fink. > If you are familiar with the BLAST terminology this is probably > very obvious, but you need to set the NUCL_REWARD and NUCL_PENALTY > options in the URL. Yeah, I did not mean to say it was biopython's fault, but the version of the code I have had no parameters under qblast for me to alter after we new there was a difference with the web interface end at NCBI. The version I have (latest via fink, biopython version: 1.43-1001) showed this: def qblast(program, database, sequence, ncbi_gi=None, descriptions=None, alignments=None, expect=None, matrix=None, filter=None, format_type="XML", hitlist_size=None, entrez_query='(none)', ): and not: def qblast(program, database, sequence, auto_format=None,composition_based_statistics=None, db_genetic_code=None,endpoints=None,entrez_query='(none)', expect=10.0,filter=None,gapcosts=None,genetic_code=None, hitlist_size=50,i_thresh=None,layout=None,lcase_mask=None, matrix_name=None,nucl_penalty=None,nucl_reward=None, other_advanced=None,perc_ident=None,phi_pattern=None, query_file=None,query_believe_defline=None,query_from=None, query_to=None,searchsp_eff=None,service=None,threshold=None, ungapped_alignment=None,word_size=None, alignments=500,alignment_view=None,descriptions=500, entrez_links_new_window=None,expect_low=None,expect_high=None, format_entrez_query=None,format_object=None,format_type='XML', ncbi_gi=None,results_file=None,show_overview=None ): So, there must be a problem with my fink installation or something. I will try and uninstall and re-install biopython via fink. -Mike From mdehoon at c2b2.columbia.edu Thu Aug 23 22:08:29 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 23 Aug 2007 22:08:29 -0400 Subject: [BioPython] qblast + documentation References: <46CDFFE8.2040900@maubp.freeserve.co.uk> <8963513B-811D-458B-BAB8-1BE64D3CA76C@colorado.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B60B@mail2.exch.c2b2.columbia.edu> > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/ > > Blast/NCBIWWW.py?rev=1.46&cvsroot=biopython&content-type=text/ > > vnd.viewcvs-markup > Are those new parameters? They do exist in the code I downloaded? > Guess I will download again from fink. It looks like these are coming from a recent improvement of Biopython. So you won't find this in Biopython release 1.43. You can just take the new NCBIWWW.py from CVS and copy it over the NCBIWWW.py in Biopython 1.43. Maybe this is a good time for a new Biopython release? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Fri Aug 24 05:35:16 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 24 Aug 2007 10:35:16 +0100 Subject: [BioPython] [Fwd: Re: qblast + documentation] Message-ID: <46CEA654.1070706@maubp.freeserve.co.uk> I managed to send this to Michael Robeson only, and not the list. In the mean time Michiel de Hoon pointed out that support for lots of parameters (including NUCL_REWARD and NUCL_PENALTY) was added after the release of Biopython 1.43, so you would have to update the file Bio/Blast/NCBIWWW.py If you don't want to bother with CVS, the easy way to do this is backup the original and then replace it with the download from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Blast/NCBIWWW.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python Looking at the history, that shouldn't impact anything else. Peter -------- Original Message -------- Subject: Re: [BioPython] qblast + documentation Date: Thu, 23 Aug 2007 22:45:12 +0100 From: Peter Reply-To: biopython at lists.open-bio.org To: Michael S. Robeson References: Michael S. Robeson wrote: > I meant to add to the other e-mail: Since we know it has to do with > the blast setting differences on the NCBI web site versus the > biopython back-end. How can I get around that? I've tried a few > things, but have not been successful. I'd just like to point out this isn't Biopython's fault - its the NCBI who seem to be using different match/mismatch penalties on their GUI webserver and their QBLAST web API. The simple answer for how to get the two queries to agree, is whenever you do any manual queries on the website, click on "Algorithm parameters" and under "Scoring Parameters" change "Match/Mismatch Scores" from "2, -3" to "1, -3" (grin). Or, if you want to change the gap penalties in the qblast call, what Biopython is doing is providing a python interface to this URL scheme: http://www.ncbi.nlm.nih.gov/BLAST/Doc/node5.html If you are familiar with the BLAST terminology this is probably very obvious, but you need to set the NUCL_REWARD and NUCL_PENALTY options in the URL. i.e. In Biopython use the optional nucl_reward and nucl_penalty arguments to the Bio.Blast.NCBIWWW.qblast function. from Bio.Blast.NCBIWWW import qblast seq_string = "TGTGATGGATATCTGCAGAATTCGCCCTTTAAACTTCAGGGTGACCAAAA" \ + "AATCAAAATAAATGTTGAAATAATACTGGATCTCCACCACCACTAACTTC" \ + "AAAAAATGTTGTATTAAAATTTCTATCAGTTAATAACATTGTTATAGCAC" \ + "CCCCTAATACTGGTAATGATAATAATAATAATCATGCTGTTATAAATACA" \ + "GCTCAAACAAATAAAGGTAACTTAAACATACTCATACCAGGTGTTCGCAT" \ + "ATTAATAACAGTAACAATAAAATTTATTGAACCTAATATTGATGATATAC" \ + "CAGCTAAATGTAAACTAAATATTGCACATTCTATTGAACCTCCTGAATGT" \ + "GAAAATATACCAGATAATGGTGGATAAACAGTTCAACCTGTACCTGCCCC" \ + "CATCTCGACTACAGATGATCAAATTAATAAAAAAAATGATGGTACTAATA" \ + "ATCAAAAACTTATATTATTTAATCTTGGGAATGCCATATCAGGAGCTCCT" \ + "ATCATTAAAGGTAAAAATCAATTACCAAAACCACCCATTAATGCAGGCAT" \ + "AACCATAAAAAATATCATTATTAAAGCATGTGCTGTTATTAACACATTAT" \ + "ATGCTTGATGATTGTAATTTAATATTACTGCACCAGCATCTGATAATTCT" \ + "ATACGTATTAATATAGATCAAAATGTTCCTATTAAACCTGCTAAAAATGC" \ + "AAATATTAAATATAATGTTCCAATATCTTTATGATTTGTTGACCAAGGGC" \ + "GAATTCCAGCACACTGGCGGCCGTTACTAG" #filename, format = "test.html", "HTML" #filename, format = "test.txt", "Text" filename, format = "test.xml", "XML" result_handle = qblast('blastn', 'nr', seq_string, format_type=format, nucl_reward=2, nucl_penalty=-3) output_handle = open(filename, "w") output_handle.write(result_handle.read()) output_handle.close() print "Done" And this does seem to agree with the results from doing the query by hand on their website with the default settings. Peter From biopython at maubp.freeserve.co.uk Mon Aug 27 14:48:55 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 27 Aug 2007 19:48:55 +0100 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> Message-ID: <46D31C97.1070200@maubp.freeserve.co.uk> Michiel De Hoon wrote: > Peter wrote: >> However, if SeqRecord acted more like a Seq (and therefore more like a >> string) then you could do this which does avoid the lambda: >> >> seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), \ >> "genbank"), seguid) >> >> Or, we could enhance your the CheckSum functions to cope with a >> SeqRecord, a Seq or a string - right now they cope with a Seq or a string. >> > Nice example. > > The SeqRecord class is one of those classes in Biopython for which I never > understood why they exist. A SeqRecord is nothing more than a Seq with some > attributes attached. If we add those attributes to the Seq class directly, > then we can get rid of the SeqRecord class. Then, functions such as those in > CheckSum only need to cope with strings. > > --Micheil. I think having SeqRecord subclass Seq is nicer than simply adding annotation to the Seq class. Seq objects would (still) just have a sequence and alphabet, the SeqRecord becomes a rich/annotated Seq object. I think this would be close to BioPerl's Seq and RichSeq objects. I have filed an enhancement on Bugzilla to hold any suggested patches etc (I hope to upload something later tonight): Bug 2351 - Make SeqRecord subclass Seq subclass string? http://bugzilla.open-bio.org/show_bug.cgi?id=2351 Peter From mdehoon at c2b2.columbia.edu Wed Aug 29 22:20:01 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 29 Aug 2007 22:20:01 -0400 Subject: [BioPython] Bio.MarkupEditor Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B60D@mail2.exch.c2b2.columbia.edu> Does anybody use this module? It is not used anywhere in Biopython. If it is not being used, I suggest to deprecate it. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From jimmy.musselwhite at gmail.com Wed Aug 29 23:46:51 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 29 Aug 2007 23:46:51 -0400 Subject: [BioPython] Cluster problem keeping me from getting started Message-ID: <86e5e8970708292046m55ffa549p4544d2a1ffd2c294@mail.gmail.com> Hi everyone I'm trying to start an assignment I have using BioPython but a critical functionality is not working for me. When I do python run_tests.py It "hangs" at test_Cluster ... and if I run test_Cluster.py by itself it does not complete properly. I installed Cluster 3.0 from the bonsai website and then I even re-installed PyCluster from the source there. I don't know what else could be wrong here! I really need Cluster to work. It's pretty much the point of my assignment. Thanks! From mdehoon at c2b2.columbia.edu Wed Aug 29 23:59:51 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 29 Aug 2007 23:59:51 -0400 Subject: [BioPython] Cluster problem keeping me from getting started References: <86e5e8970708292046m55ffa549p4544d2a1ffd2c294@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B60F@mail2.exch.c2b2.columbia.edu> This was fixed in CVS. If you pick up the new Bio.Cluster source files from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Cluster/?c vsroot=biopython copy them over the corresponding files in the biopython-1.43 distribution, and reinstall, the test should run to completion. If it doesn't, please let me know. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Jimmy Musselwhite Sent: Wed 8/29/2007 11:46 PM To: biopython at lists.open-bio.org Subject: [BioPython] Cluster problem keeping me from getting started Hi everyone I'm trying to start an assignment I have using BioPython but a critical functionality is not working for me. When I do python run_tests.py It "hangs" at test_Cluster ... and if I run test_Cluster.py by itself it does not complete properly. I installed Cluster 3.0 from the bonsai website and then I even re-installed PyCluster from the source there. I don't know what else could be wrong here! I really need Cluster to work. It's pretty much the point of my assignment. Thanks! _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From jdiezperezj at gmail.com Wed Aug 1 08:13:24 2007 From: jdiezperezj at gmail.com (=?ISO-8859-1?Q?Javier_D=EDez?=) Date: Wed, 1 Aug 2007 10:13:24 +0200 Subject: [BioPython] blast output xml In-Reply-To: <46AF51CA.2030005@maubp.freeserve.co.uk> References: <46AF51CA.2030005@maubp.freeserve.co.uk> Message-ID: Hy Peter, I followed the tutorial cookbook steps but I got the same exception (see below). The output I got running blast locally is not XML but text. I guess it might be related to the version I'm using (python 2.5, biopython 1.42.2on ubuntu 7.4). Any idea? On the other hand I think there is a mistake in the cookbook, instead of: >>> from Bio.Blast import NCBIXML >>> blast_records = NCBIXML.parse(result_handle) I think it should be: >>> blast_parser= NCBIXML.BlastParser() >>> blast_result=blast_parser.parse(result_handle) Thanks Javi Traceback (most recent call last): File "", line 1, in File "/var/lib/python-support/python2.5/Bio/Blast/NCBIXML.py", line 112, in parse self._parser.parse(handler) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/xmlreader.py", line 125, in parse self.close() File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 226, in close self.feed("", isFinal = 1) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 220, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.5/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: :1:0: no element found On 7/31/07, Peter wrote: > > Javier D?ez wrote: > > Hi, > > Does anyone knows if is it possible to get blast-xml output running > blast > > from biopython scripts? > > How can I do that? > > Thanks > > Javi > > Yes, you can run standalone blast from Biopython, and parse its XML > output. See "Chapter 3 BLAST" of the tutorial: > > http://biopython.org/DIST/docs/tutorial/Tutorial.html > > Note that while parsing the plain text worked well with older versions > of BLAST. We don't recommend using this anymore - use the XML output. > > Peter > From mdehoon at c2b2.columbia.edu Wed Aug 1 08:55:05 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 1 Aug 2007 04:55:05 -0400 Subject: [BioPython] blast output xml References: <46AF51CA.2030005@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> Hi Javier, > I followed the tutorial cookbook steps but I got the same exception (see > below). The output I got running blast locally is not XML but text. I guess > it might be related to the version I'm using (python 2.5, biopython > 1.42.2on ubuntu 7.4). Any idea? Use Biopython 1.43. > On the other hand I think there is a mistake in the cookbook, instead of: > >>> from Bio.Blast import NCBIXML > >>> blast_records = NCBIXML.parse(result_handle) > I think it should be: > >>> blast_parser= NCBIXML.BlastParser() > >>> blast_result=blast_parser.parse(result_handle) The description in the cookbook is correct for Biopython 1.43. As a note to the developers: We should ask ourselves if distributing packages for various linuxes is really a good idea. Installing Biopython isn't all that hard, and the packages for linux tend to get behind as new Biopython versions come out, causing confusion for our users. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Wed Aug 1 09:38:16 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 01 Aug 2007 10:38:16 +0100 Subject: [BioPython] blast output xml In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> References: <46AF51CA.2030005@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> Message-ID: <46B05488.20609@maubp.freeserve.co.uk> In Biopython 1.42, when calling standalone blast we defaulted requesting plain text output. In Biopython 1.43, when calling standalone blast we default requesting to XML text output. Around this time, the NCBI made a big change to how they produced XML output for multiple queries, and we had to change the way the NCBIXML library was used in Biopython 1.43 - and updated the documentation to match. Michiel De Hoon wrote: > > The description in the cookbook is correct for Biopython 1.43. > Maybe we should have added an explicit "you need Biopython 1.43 or later" rider on the Blast chapter? > As a note to the developers: > We should ask ourselves if distributing packages for various linuxes is > really a good idea. Installing Biopython isn't all that hard, and the > packages for linux tend to get behind as new Biopython versions come out, > causing confusion for our users. I think having the distributions provide (current) Biopython is great, as it is much simpler for new users to install it and its dependencies this way. However, because of the problem of out of data packages, we should try and get in touch with the main distribution maintainers and try and bring them up to date more quickly. Peter From luca.beltrame at unimi.it Wed Aug 1 09:34:36 2007 From: luca.beltrame at unimi.it (Luca Beltrame) Date: Wed, 01 Aug 2007 11:34:36 +0200 Subject: [BioPython] blast output xml In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> References: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> Message-ID: <200708011134.36479.luca.beltrame@unimi.it> On Wednesday 01 August 2007 10:55:05 Michiel De Hoon wrote: > We should ask ourselves if distributing packages for various linuxes is > really a good idea. Installing Biopython isn't all that hard, and the Some (like myself) prefer to use the distribution's package management system which at least prevents conflicts with multiple versions and provides a better way to track what's installed and what's not. So, personally I think it's a good idea to distribute packages. -- Luca Beltrame, MSc. - Molecular Medicine PhD Student Dipartimento di Scienze e Tecnologie Biomediche - UniMI E-mail: luca dot beltrame at unimi dot it From sbassi at gmail.com Wed Aug 1 17:48:33 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 1 Aug 2007 14:48:33 -0300 Subject: [BioPython] blast output xml In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> References: <46AF51CA.2030005@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B5FE@mail2.exch.c2b2.columbia.edu> Message-ID: On 8/1/07, Michiel De Hoon wrote: > As a note to the developers: > We should ask ourselves if distributing packages for various linuxes is > really a good idea. Installing Biopython isn't all that hard, and the > packages for linux tend to get behind as new Biopython versions come out, > causing confusion for our users. I think that packages should be distributed, but with a clear warning about the benefits of installing the last version. In some webhostings, webmasters won't let you install a tar.gz package, but they would allow you to install a package in Debian/Ubuntu repositories (for security reasons). Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From jodyhey at yahoo.com Wed Aug 1 19:56:54 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Wed, 1 Aug 2007 12:56:54 -0700 (PDT) Subject: [BioPython] accessing "data quality" Phrap records in Genbank Message-ID: <14649.20165.qm@web53907.mail.re2.yahoo.com> for some sequence records, NCBI has a a record of the Phrap scores corresponding to the sequence (i.e. one score for each base). These are typically records containing draft sequences from genome projects to see an example, try this link http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&qty=1&c_start=1&list_uids=153792835&uids=&dopt=qual&dispmax=5&sendto=&fmt_mask=0&from=begin&to=end&extrafeatpresent=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32&ef_STS=64&ef_tRNA=128&ef_microRNA=256&ef_Exon=512 How could I go about downloading these sequence quality scores? I need to filter the data by a certain score thanks jhey ____________________________________________________________________________________ Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/ From biopython at maubp.freeserve.co.uk Wed Aug 1 21:05:33 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 01 Aug 2007 22:05:33 +0100 Subject: [BioPython] accessing "data quality" Phrap records in Genbank In-Reply-To: <14649.20165.qm@web53907.mail.re2.yahoo.com> References: <14649.20165.qm@web53907.mail.re2.yahoo.com> Message-ID: <46B0F59D.7080804@maubp.freeserve.co.uk> Emanuel Hey wrote: > for some sequence records, NCBI has a a record of the > Phrap scores corresponding to the sequence (i.e. one > score for each base). > > These are typically records containing draft sequences > from genome projects > > to see an example, try this link > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&qty=1&c_start=1&list_uids=153792835&uids=&dopt=qual&dispmax=5&sendto=&fmt_mask=0&from=begin&to=end&extrafeatpresent=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32&ef_STS=64&ef_tRNA=128&ef_microRNA=256&ef_Exon=512 > > How could I go about downloading these sequence > quality scores? One option for getting the data would be to construct the URL then download it using standard python tools, e.g. the urllib.urlretrieve function. Alternatively Biopython has some NCBI/Entrez code you might be able to use... The second step is actually parsing the data file into a usable form. The "Base Quality" format looks very easy to parse, with a FASTA like header followed by space separated decimal scores. Their XML format also looks fairly simple - the core data looks like its held as a string where each two characters represents one score in hex. As far as I could see based on the URL you gave, none of the other format options actually contain the "data quality" information. I'm not aware of any code in Biopython to cope with either of these file formats. > I need to filter the data by a certain score Are you trying to select parts of the associated sequence? Peter From fkauff at biologie.uni-kl.de Thu Aug 2 07:54:24 2007 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Thu, 02 Aug 2007 09:54:24 +0200 Subject: [BioPython] accessing "data quality" Phrap records in Genbank In-Reply-To: <46B0F59D.7080804@maubp.freeserve.co.uk> References: <14649.20165.qm@web53907.mail.re2.yahoo.com> <46B0F59D.7080804@maubp.freeserve.co.uk> Message-ID: On Wed, 01 Aug 2007 22:05:33 +0100 Peter wrote: > Emanuel Hey wrote: >> for some sequence records, NCBI has a a record of the >> Phrap scores corresponding to the sequence (i.e. one >> score for each base). >> ...> > I'm not aware of any code in Biopython to cope with >either of these file > formats. > Ace.py and Phd.py in Bio/Sequencing are two parsers that parse the phd and ace files generated by phred and phrap Frank >> I need to filter the data by a certain score > > Are you trying to select parts of the associated >sequence? > > Peter > > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython -- J-Prof. Dr. Frank Kauff Molecular Phylogenetics FB Biologie, 13/276 TU Kaiserslautern Postfach 3049 67653 Kaiserslautern Tel. +49 (0)631 205-2562 Fax. +49 (0)631 205-2998 email: fkauff at biologie.uni-kl.de skype: frank.kauff From jodyhey at yahoo.com Fri Aug 3 01:20:38 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Thu, 2 Aug 2007 18:20:38 -0700 (PDT) Subject: [BioPython] accessing "data quality" Phrap records in Genbank In-Reply-To: <46B0F59D.7080804@maubp.freeserve.co.uk> Message-ID: <81290.42390.qm@web53907.mail.re2.yahoo.com> Thanks. urllib does the job quite well. --- Peter wrote: > Emanuel Hey wrote: > > for some sequence records, NCBI has a a record of > the > > Phrap scores corresponding to the sequence (i.e. > one > > score for each base). > > > > These are typically records containing draft > sequences > > from genome projects > > > > to see an example, try this link > > > > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&qty=1&c_start=1&list_uids=153792835&uids=&dopt=qual&dispmax=5&sendto=&fmt_mask=0&from=begin&to=end&extrafeatpresent=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32&ef_STS=64&ef_tRNA=128&ef_microRNA=256&ef_Exon=512 > > > > How could I go about downloading these sequence > > quality scores? > > One option for getting the data would be to > construct the URL then > download it using standard python tools, e.g. the > urllib.urlretrieve > function. Alternatively Biopython has some > NCBI/Entrez code you might be > able to use... > > The second step is actually parsing the data file > into a usable form. > The "Base Quality" format looks very easy to parse, > with a FASTA like > header followed by space separated decimal scores. > Their XML format > also looks fairly simple - the core data looks like > its held as a string > where each two characters represents one score in > hex. As far as I > could see based on the URL you gave, none of the > other format options > actually contain the "data quality" information. > > I'm not aware of any code in Biopython to cope with > either of these file > formats. > > > I need to filter the data by a certain score > > Are you trying to select parts of the associated > sequence? > > Peter > > ____________________________________________________________________________________ Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469 From letondal at pasteur.Fr Fri Aug 3 08:55:12 2007 From: letondal at pasteur.Fr (Catherine Letondal) Date: Fri, 3 Aug 2007 10:55:12 +0200 Subject: [BioPython] Course in informatics for biology 2008 at Institut Pasteur Message-ID: <575EDD34-F5C4-4DF4-97FF-CCC1E4219A01@pasteur.Fr> Hi, ************************************************************************ * Course in informatics for biology 2008 at Institut Pasteur http://www.pasteur.fr/formation/infobio-en.html ************************************************************************ * In the series of courses offered at the Pasteur Institute, a course will be offered in informatics in biology. The next session will take place from January to end of April 2008. The main goal of this course is to provide researchers in biology an initial exposure to informatics. Admitance in the course is reserved for those with a degree in biology or a related discipline. With more and more bioinformatics tools available, it becomes increasingly important for researchers in biology to be able both to manage their data, implement their ideas, and judge for themselves the usefulness of new algorithms and software. This course will emphasize fundamental aspects of computer science and apply them to biological examples. Theoretical aspects (algorithm development, logic, problem modeling and design methods), and technical applications (databases and web technologies) that are relevant for biologists will be thoroughly discussed. Programming is presented through the object-oriented paradigm, using a modern high-level language, Python, provided with tools for biology and enabling both prototyping or scripting and the building of important software systems. Learning of an additional language (C) will be available for interested students. Learning during the course will be reinforced with computing exercises, and effective training will be provided by a 2 month research project. The working language of the course is French. For further information, please consult: http://www.pasteur.fr/formation/infobio-en.html *** Registration will be closed on October 1 2007. *** Sincerely, -- Benno Schwikowski & Catherine Letondal Institut Pasteur -- Course in Informatics for Biology www.pasteur.fr/formation/infobio From sbassi at gmail.com Tue Aug 14 16:45:26 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 13:45:26 -0300 Subject: [BioPython] Inverse codon table? Message-ID: I am looking for a table in Biopython of amino-acids and its codons. I found in Data.CodonTable a table but it should be used in the inverse way I need: >>> from Bio.Data import CodonTable >>> t=CodonTable.standard_dna_table >>> t.forward_table["ATG"] 'M' I need a table that I coulld enter "M" and get the all triples that translate into "M". Is this available in Biopyhon? -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From sbassi at gmail.com Tue Aug 14 17:38:01 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 14:38:01 -0300 Subject: [BioPython] Inverse codon table? In-Reply-To: <46C1E3AB.1000100@maubp.freeserve.co.uk> References: <46C1E3AB.1000100@maubp.freeserve.co.uk> Message-ID: On 8/14/07, Peter wrote: > I'm not sure without a thorough check, but its fairly be trivial to > build such a dictionary from the CodonTable.standard_dna_table like this: ... Thnaks. That is nicer than my hand made back trans dictionary: tdict={"A":("GCT","GCC","GCA","GCG"),"R":("CGT","CGC","CGA","CGG","AGA","AGG"),\ "N":("AAT","ACC"),"D":("GAT","GAC"),"C":("TGT","TGC"),"Q":("CAA","CAG"),\ "E":("GAA","GAG"),"G":("GGT","GGC","GGA","GGG"),"H":("CAT","CAC"),\ "I":("ATT","ATC","ATA"),"L":("TTA","TTG","CTT","CTC","CTA","CTG"),\ "K":("AAA","AAG"),"M":("AUG"),"F":("TTT","TTC"),"P":\ ("CCT","CCC","CCA","CCG"),"S":("TCT","TCC","TCA","TCG","AGT","AGC"),\ "T":("ACT","ACC","ACA","ACG"),"W":("TGG"),"Y":("TAT","TAC"),\ "V":("GTT","GTC","GTA","GTG")} And yours is better since allows to construct a back dict using another translation table. Best, SB. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Tue Aug 14 17:17:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 14 Aug 2007 18:17:31 +0100 Subject: [BioPython] Inverse codon table? In-Reply-To: References: Message-ID: <46C1E3AB.1000100@maubp.freeserve.co.uk> Sebastian Bassi wrote: > I am looking for a table in Biopython of amino-acids and its codons. I > found in Data.CodonTable a table but it should be used in the inverse > way I need: > >>>> from Bio.Data import CodonTable >>>> t=CodonTable.standard_dna_table >>>> t.forward_table["ATG"] > 'M' > > I need a table that I coulld enter "M" and get the all triples that > translate into "M". Is this available in Biopyhon? > I'm not sure without a thorough check, but its fairly be trivial to build such a dictionary from the CodonTable.standard_dna_table like this: from Bio.Data import CodonTable t=CodonTable.standard_dna_table t.forward_table["ATG"] #Build a back table dict where the values are lists of codons... bt = dict() for a1 in "ATCG" : for a2 in "ATCG" : for a3 in "ATCG" : codon = a1+a2+a3 try : amino = t.forward_table[codon] except KeyError : assert codon in t.stop_codons continue try : bt[amino].append(codon) except KeyError : bt[amino] = [codon] for amino in bt : print amino, bt[amino] Peter From p.pagel at gsf.de Tue Aug 14 18:08:18 2007 From: p.pagel at gsf.de (Philipp Pagel) Date: Tue, 14 Aug 2007 20:08:18 +0200 Subject: [BioPython] Inverse codon table? In-Reply-To: References: Message-ID: <20070814180817.GA3154@gsf.de> Hi! > I need a table that I coulld enter "M" and get the all triples that > translate into "M". Is this available in Biopyhon? Hm - I'm not sure but it's easy enough to make one yourself: >>> foo = {} >>> for codon, AA in t.forward_table.items(): ... if AA not in foo: ... foo[AA] = [] ... foo[AA].append(codon) ... >>> foo['L'] ['CTT', 'CTG', 'CTA', 'CTC', 'TTA', 'TTG'] >>> foo['V'] ['GTA', 'GTC', 'GTG', 'GTT'] cu Philipp -- Dr. Philipp Pagel Tel. +49-8161-71 2131 Lehrstuhl f?r Genomorientierte Bioinformatik Fax. +49-8161-71 2186 Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany and Institut f?r Bioinformatik / MIPS GSF - Forschungszentrum f?r Umwelt und Gesundheit Ingolst?dter Landstrasse 1 85764 Neuherberg, Germany http://mips.gsf.de/staff/pagel From sbassi at gmail.com Tue Aug 14 18:44:15 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 15:44:15 -0300 Subject: [BioPython] Inverse codon table? In-Reply-To: <20070814180817.GA3154@gsf.de> References: <20070814180817.GA3154@gsf.de> Message-ID: On 8/14/07, Philipp Pagel wrote: > >>> foo = {} > >>> for codon, AA in t.forward_table.items(): > ... if AA not in foo: > ... foo[AA] = [] > ... foo[AA].append(codon) That is also woks! My goal is to make a function that will return ALL possible backtranslated sequences from a given peptide. The back_translate function from Biopython just return one possible sequence. I want them all :) -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From sbassi at gmail.com Tue Aug 14 18:55:24 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Tue, 14 Aug 2007 15:55:24 -0300 Subject: [BioPython] Inverse codon table? In-Reply-To: References: <20070814180817.GA3154@gsf.de> Message-ID: On 8/14/07, Sebastian Bassi wrote: > function from Biopython just return one possible sequence. I want them > all :) This returns all possible combinations: peptidoORI="MANCNGASK" tot=0 for x in peptidoORI: tot=tot+len(bt[x]) allpepts=[""]*tot for aa in peptidoORI: i=0 for x in range(tot): if i==len(bt[aa]): i=0 else: pass allpepts[x]=allpepts[x]+bt[aa][i] i=i+1 >>> allpepts ['ATGGCAAATTGTAATGGAGCAAGTAAA', 'ATGGCTAACTGCAACGGTGCTAGCAAG', 'ATGGCCAATTGTAATGGCGCCTCAAAA', 'ATGGCGAACTGCAACGGGGCGTCTAAG', 'ATGGCAAATTGTAATGGAGCATCCAAA', 'ATGGCTAACTGCAACGGTGCTTCGAAG', 'ATGGCCAATTGTAATGGCGCCAGTAAA', 'ATGGCGAACTGCAACGGGGCGAGCAAG', 'ATGGCAAATTGTAATGGAGCATCAAAA', 'ATGGCTAACTGCAACGGTGCTTCTAAG', 'ATGGCCAATTGTAATGGCGCCTCCAAA', 'ATGGCGAACTGCAACGGGGCGTCGAAG', 'ATGGCAAATTGTAATGGAGCAAGTAAA', 'ATGGCTAACTGCAACGGTGCTAGCAAG', 'ATGGCCAATTGTAATGGCGCCTCAAAA', 'ATGGCGAACTGCAACGGGGCGTCTAAG', 'ATGGCAAATTGTAATGGAGCATCCAAA', 'ATGGCTAACTGCAACGGTGCTTCGAAG', 'ATGGCCAATTGTAATGGCGCCAGTAAA', 'ATGGCGAACTGCAACGGGGCGAGCAAG', 'ATGGCAAATTGTAATGGAGCATCAAAA', 'ATGGCTAACTGCAACGGTGCTTCTAAG', 'ATGGCCAATTGTAATGGCGCCTCCAAA', 'ATGGCGAACTGCAACGGGGCGTCGAAG', 'ATGGCAAATTGTAATGGAGCAAGTAAA', 'ATGGCTAACTGCAACGGTGCTAGCAAG', 'ATGGCCAATTGTAATGGCGCCTCAAAA'] -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From Gabino.Sanchez-Perez at Dal.Ca Tue Aug 14 23:57:40 2007 From: Gabino.Sanchez-Perez at Dal.Ca (Gabino Sanchez-Perez) Date: Tue, 14 Aug 2007 20:57:40 -0300 Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML Message-ID: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> Hi I'm parsing a blastp output in XML format using NCBIXML, but my problem is that I don't know how to get the Hit length (not alignment length), even though is defined in the parser: def _end_Hit_len(self): self._hit.length = int(self._value) So in my script when I iterate for alignment in brecord.alignments: for hsp in alignment.hits: it seems that is not defined and the error message is that hit.length does not exist. Thanks in advance Gabino From mdehoon at c2b2.columbia.edu Wed Aug 15 00:45:26 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 14 Aug 2007 20:45:26 -0400 Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML References: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B604@mail2.exch.c2b2.columbia.edu> This should work: >>> for alignment in brecord.alignments: ... print alignment.length If it doesn't, please send us the exact code you are using, and the exact error message that you got. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Gabino Sanchez-Perez Sent: Tue 8/14/2007 7:57 PM To: biopython at lists.open-bio.org Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML Hi I'm parsing a blastp output in XML format using NCBIXML, but my problem is that I don't know how to get the Hit length (not alignment length), even though is defined in the parser: def _end_Hit_len(self): self._hit.length = int(self._value) So in my script when I iterate for alignment in brecord.alignments: for hsp in alignment.hits: it seems that is not defined and the error message is that hit.length does not exist. Thanks in advance Gabino _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From winter at biotec.tu-dresden.de Wed Aug 15 07:36:47 2007 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Wed, 15 Aug 2007 09:36:47 +0200 Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML In-Reply-To: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> References: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> Message-ID: <46C2AD0F.3050207@biotec.tu-dresden.de> Gabino Sanchez-Perez wrote: > Hi > > I'm parsing a blastp output in XML format using NCBIXML, but my problem is that > I don't know how to get the Hit length (not alignment length), even though is > defined in the parser: > > def _end_Hit_len(self): > self._hit.length = int(self._value) > > So in my script when I iterate > > for alignment in brecord.alignments: > for hsp in alignment.hits: > > it seems that is not defined and the error message is that hit.length does not > exist. I can confirm what Michiel wrote: alignment.length = total length of the unaligned hit sequence HTH, Christof From Gabino.Sanchez-Perez at Dal.Ca Wed Aug 15 12:23:26 2007 From: Gabino.Sanchez-Perez at Dal.Ca (Gabino Sanchez-Perez) Date: Wed, 15 Aug 2007 09:23:26 -0300 Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B604@mail2.exch.c2b2.columbia.edu> References: <20070814205740.rd456mf5ow844o8k@my3.dal.ca> <6243BAA9F5E0D24DA41B27997D1FD14402B604@mail2.exch.c2b2.columbia.edu> Message-ID: <20070815092326.ah56e2ud8w00k4wk@my5.dal.ca> It works! I was confounded because I was looking for the length of the alignment (that now I see that is in hsp.sbjct) and I thought it was in alignment.length (Hit length). Thank you very much for the help Gabino > This should work: > >>>> for alignment in brecord.alignments: > ... print alignment.length > > If it doesn't, please send us the exact code you are using, and the exact > error message that you got. > > --Michiel. > > > Michiel de Hoon > Center for Computational Biology and Bioinformatics > Columbia University > 1150 St Nicholas Avenue > New York, NY 10032 > > > > -----Original Message----- > From: biopython-bounces at lists.open-bio.org on behalf of Gabino Sanchez-Perez > Sent: Tue 8/14/2007 7:57 PM > To: biopython at lists.open-bio.org > Subject: [BioPython] Problem to obtain Hit length parsing with NCBIXML > > Hi > > I'm parsing a blastp output in XML format using NCBIXML, but my problem is > that > I don't know how to get the Hit length (not alignment length), even though is > defined in the parser: > > def _end_Hit_len(self): > self._hit.length = int(self._value) > > So in my script when I iterate > > for alignment in brecord.alignments: > for hsp in alignment.hits: > > it seems that is not defined and the error message is that hit.length does > not > exist. > > Thanks in advance > > Gabino > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > From douglas.kojetin at gmail.com Wed Aug 15 18:30:41 2007 From: douglas.kojetin at gmail.com (Douglas Kojetin) Date: Wed, 15 Aug 2007 14:30:41 -0400 Subject: [BioPython] Bio.PDB rotate/translate Message-ID: <152CE16F-7F6D-46F3-B285-59FBB3671EED@gmail.com> Hi All, I would like to rotate a specific H and N vector in my structure file, so that it points in a different direction, but keep its location fixed in 3D space. Here is what I've been working with, but it seems to move the atoms away from their original location. I grabbed most of this from the Bio.PDB FAQ (the PDF file) and from the Bio.PDB documentation online. I'm not sure if I should be using different values for the Vector/rotation line, or in the translation array, etc. ### atom1 = structure[model.id][chain.id][residue.id[1]]['H'] atom2 = structure[model.id][chain.id][residue.id[1]]['N'] rotation = Bio.PDB.rotaxis( math.pi/2.0, Bio.PDB.Vector(1,0,0) ) translation = Numeric.array( (1,0,0), 'f' ) atom1.transform(rotation, translation) atom2.transform(rotation, translation) ### Thanks in advance for the help, Doug From jdiezperezj at gmail.com Thu Aug 16 17:27:45 2007 From: jdiezperezj at gmail.com (=?ISO-8859-1?Q?Javier_D=EDez?=) Date: Thu, 16 Aug 2007 19:27:45 +0200 Subject: [BioPython] psi mi Message-ID: Hy, Does anyone knows a python parser for PSI-MI format? Thanks Javi From winter at biotec.tu-dresden.de Thu Aug 16 17:53:01 2007 From: winter at biotec.tu-dresden.de (Christof Winter) Date: Thu, 16 Aug 2007 19:53:01 +0200 Subject: [BioPython] psi mi In-Reply-To: References: Message-ID: <46C48EFD.6080701@biotec.tu-dresden.de> Javier D?ez wrote: > Hy, > Does anyone knows a python parser for PSI-MI format? > Thanks > Javi Hi Javier: I searched for one some time ago, and since I didn't find one, I wrote my own. It's based on ElementTree: http://effbot.org/zone/element-index.htm http://docs.python.org/lib/module-xml.etree.ElementTree.html It only parses some basic data from the PSIMI file, if you want to have a look at it, just drop me a line. Cheers, Christof From meesters at uni-mainz.de Thu Aug 16 18:33:56 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 16 Aug 2007 20:33:56 +0200 Subject: [BioPython] Bio.PDB: possible to obtain chain identifier for atoms? Message-ID: <1187289236.19033.17.camel@cmeesters> Hi All, I'm trying to build a somewhat bigger app which needs to calculate the distance distribution function of a protein several times. For this purpose I re-discovered BioPython today. Speed is an issue, but since the protein in question isn't too big the overhead of Python and BioPython might be tolerable. Still, in order to speed up my code I would need to access the chain identifier of atoms within a loop. To illustrate my question some pseudo-code: atoms = Selection.unfold_entities(structure, 'A') for i, a in enumerate(atoms): for b in atoms[i+1:]: # do some calculations here ... Now, I cannot avoid the double loop, but if I could check somehow whether atoms a and b belong to the same chain (and which) I would have to do this awkward looping only once this way. Does anybody know an approach? As far as I understand the code of the Atom class, there is no direct way, right? TIA Christian From meesters at uni-mainz.de Thu Aug 16 18:50:15 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 16 Aug 2007 20:50:15 +0200 Subject: [BioPython] Bio.PDB rotate/translate In-Reply-To: <152CE16F-7F6D-46F3-B285-59FBB3671EED@gmail.com> References: <152CE16F-7F6D-46F3-B285-59FBB3671EED@gmail.com> Message-ID: <1187290215.19033.34.camel@cmeesters> Hi, As you can see from my previous mail, I only just started to work with BioPython today, again. Still, I dare to ask back a few question. Actually, I don't understand your question: > I would like to rotate a specific H and N vector in my structure > file Guess, this is a mistake and you don't want to rotate the vector itself. I presume your structure looks like this: H N \/ c1 | c2 right? And you want to turn around the axis defined by c1-c2? Then, if I understand the 'PDB FAQ' correctly the code before the rotaxis line should read: c1 = residue[c1-identifier].get_vector() c2 = residue[c2-identifier].get_vector() rotation = rotaxis(angle, c1-c2) Of course, your residue won't look like my sketch ... > > rotation = Bio.PDB.rotaxis( math.pi/2.0, Bio.PDB.Vector(1,0,0) ) > translation = Numeric.array( (1,0,0), 'f' ) Here, you would shift the atoms by 1 ? along 'x'. Applied on only two atoms, bonds would be stretched. BTW, did you save the altered structure after applying the rotation/translation? If this was all wrong, please excuse a newbie answering this question. But the mail was left unanswered for a while and it is still possible to correct me ;-). HTH Christian From meesters at uni-mainz.de Fri Aug 17 07:15:39 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Fri, 17 Aug 2007 09:15:39 +0200 Subject: [BioPython] Bio.PDB: possible to obtain chain identifier for atoms? In-Reply-To: <8e76d5310708161217s7e59e2ado6b4b514f6192c38e@mail.gmail.com> References: <1187289236.19033.17.camel@cmeesters> <8e76d5310708161217s7e59e2ado6b4b514f6192c38e@mail.gmail.com> Message-ID: <1187334939.26144.1.camel@cmeesters> Hoi Katie, apparently, yes. Life can be so easy ... Thanks, Christian On Thu, 2007-08-16 at 15:17 -0400, Katie Edmonds wrote: > Hi Christian, > > are you looking for: > chain1 = a.get_parent().get_parent() > ? > > Katie > > On 8/16/07, Christian Meesters wrote: > Hi All, > > I'm trying to build a somewhat bigger app which needs to > calculate the > distance distribution function of a protein several times. For > this > purpose I re-discovered BioPython today. > Speed is an issue, but since the protein in question isn't too > big the > overhead of Python and BioPython might be tolerable. > > Still, in order to speed up my code I would need to access the > chain > identifier of atoms within a loop. To illustrate my question > some > pseudo-code: > > atoms = Selection.unfold_entities(structure, 'A') > for i, a in enumerate(atoms): > for b in atoms[i+1:]: > # do some calculations here ... > > Now, I cannot avoid the double loop, but if I could check > somehow > whether atoms a and b belong to the same chain (and which) I > would have > to do this awkward looping only once this way. > > Does anybody know an approach? As far as I understand the code > of the > Atom class, there is no direct way, right? > > TIA > Christian > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From Michael.Robeson at Colorado.EDU Sat Aug 18 16:42:22 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Sat, 18 Aug 2007 10:42:22 -0600 Subject: [BioPython] qblast Message-ID: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> I have been using qblast (blastn) for a set of sequences. However, it gives me _completely_ different results than if I was to use the web interface at NCBI (also using blastn, nr). Is anyone else having this problem? Am I missing something? Can anyone help with making sure that I am using the same settings in qblast as with the NCBI web interface? Also, are there any plans for implementing megablast? -Thanks! -Mike From biopython at maubp.freeserve.co.uk Sat Aug 18 19:25:48 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Aug 2007 20:25:48 +0100 Subject: [BioPython] qblast In-Reply-To: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> Message-ID: <46C747BC.6010002@maubp.freeserve.co.uk> Michael S. Robeson wrote: > I have been using qblast (blastn) for a set of sequences. However, it > gives me _completely_ different results than if I was to use the web > interface at NCBI (also using blastn, nr). Is anyone else having this > problem? Am I missing something? Could you give us an example? Say three of sequences with details of how you use the web interface, and the code you are using in Biopython. Peter From Michael.Robeson at Colorado.EDU Sat Aug 18 20:51:24 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Sat, 18 Aug 2007 14:51:24 -0600 Subject: [BioPython] qblast In-Reply-To: <46C747BC.6010002@maubp.freeserve.co.uk> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> Message-ID: Sure, I am using a really simple script that I wrote (link below) to start getting familiar w/ biopython. So, any other comments would be greatly appreciated. http://ucsu.colorado.edu/~robeson/documents/blast.txt Using this sequence: TGTGATGGATATCTGCAGAATTCGCCCTTTAAACTTCAGGGTGACCAAAA AATCAAAATAAATGTTGAAATAATACTGGATCTCCACCACCACTAACTTC AAAAAATGTTGTATTAAAATTTCTATCAGTTAATAACATTGTTATAGCAC CCCCTAATACTGGTAATGATAATAATAATAATCATGCTGTTATAAATACA GCTCAAACAAATAAAGGTAACTTAAACATACTCATACCAGGTGTTCGCAT ATTAATAACAGTAACAATAAAATTTATTGAACCTAATATTGATGATATAC CAGCTAAATGTAAACTAAATATTGCACATTCTATTGAACCTCCTGAATGT GAAAATATACCAGATAATGGTGGATAAACAGTTCAACCTGTACCTGCCCC CATCTCGACTACAGATGATCAAATTAATAAAAAAAATGATGGTACTAATA ATCAAAAACTTATATTATTTAATCTTGGGAATGCCATATCAGGAGCTCCT ATCATTAAAGGTAAAAATCAATTACCAAAACCACCCATTAATGCAGGCAT AACCATAAAAAATATCATTATTAAAGCATGTGCTGTTATTAACACATTAT ATGCTTGATGATTGTAATTTAATATTACTGCACCAGCATCTGATAATTCT ATACGTATTAATATAGATCAAAATGTTCCTATTAAACCTGCTAAAAATGC AAATATTAAATATAATGTTCCAATATCTTTATGATTTGTTGACCAAGGGC GAATTCCAGCACACTGGCGGCCGTTACTAG Using blastn. When I use the web site I set the following: Database = Others (nr etc..) w/ drop menu set to 'Nucleotide collection (nr/nt)' Optimize for = 'Somewhat similar sequences (blastn)' I assumed the above was the same as biopython's default qblast when using blastn. Basically, my script seemed worked fine until NCBI changed the layout of their site (I'd check every so often with a set of samples). I also tried to change my code with some of the new biopython code. But it is very hard to find good up-to-date tutorials. Last time I looked none of them worked. Anyway, no matter what settings I use via the NCBI web interface I cannot get the same results as I do with qblast. So, unless I am missing something I wonder if there is something wrong with the back- end? -Mike On Aug 18, 2007, at 1:25 PM, Peter wrote: > Michael S. Robeson wrote: >> I have been using qblast (blastn) for a set of sequences. However, >> it gives me _completely_ different results than if I was to use >> the web interface at NCBI (also using blastn, nr). Is anyone else >> having this problem? Am I missing something? > > Could you give us an example? Say three of sequences with details > of how you use the web interface, and the code you are using in > Biopython. > > Peter > ---------------------------------------------- Michael S. Robeson II Ecology and Evolutionary Biology N122 Ramaley Hall Campus Box 334 University of Colorado Boulder, CO 80309 robeson at colorado.edu http://ucsu.colorado.edu/~robeson/ From biopython at maubp.freeserve.co.uk Sat Aug 18 21:55:08 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Aug 2007 22:55:08 +0100 Subject: [BioPython] qblast In-Reply-To: References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> Message-ID: <46C76ABC.9040608@maubp.freeserve.co.uk> Michael S. Robeson wrote: > Sure, I am using a really simple script that I wrote (link below) to > start getting familiar w/ biopython. So, any other comments would be > greatly appreciated. > > http://ucsu.colorado.edu/~robeson/documents/blast.txt Assuming you are using Biopython 1.43, then I would have suggested using Bio.SeqIO rather than Bio.Fasta for reading/writing FASTA files. See the tutorial or http://www.biopython.org/wiki/SeqIO I'm also unclear why you want to save individual output blast HTML pages (one file per sequence). Trying to parse these HTML files later will be a pain - go from the XML output if you want to read the blast output with Biopython. > I assumed the above was the same as biopython's default qblast when > using blastn. Basically, my script seemed worked fine until NCBI > changed the layout of their site (I'd check every so often with a set > of samples). See my other email - they should be the same if all the parameters are the same (specifically it looks like the NCBI seems to be using different defaults for the gap penalties, which was very confusing). > I also tried to change my code with some of the new > biopython code. But it is very hard to find good up-to-date > tutorials. Last time I looked none of them worked. Can you point out any out of date documentation please? As far as I know, the BLAST examples in our tutorial work fine: http://biopython.org/DIST/docs/tutorial/Tutorial.html Peter From biopython at maubp.freeserve.co.uk Sat Aug 18 21:38:52 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Aug 2007 22:38:52 +0100 Subject: [BioPython] qblast In-Reply-To: References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> Message-ID: <46C766EC.6030601@maubp.freeserve.co.uk> Hi Michael, Here is a short script based on your example, which I have tested for calling qblast: from Bio.Blast.NCBIWWW import qblast seq_string = "TGTGATGGATATCTGCAGAATTCGCCCTTTAAACTTCAGGGTGACCAAAA" \ + "AATCAAAATAAATGTTGAAATAATACTGGATCTCCACCACCACTAACTTC" \ + "AAAAAATGTTGTATTAAAATTTCTATCAGTTAATAACATTGTTATAGCAC" \ + "CCCCTAATACTGGTAATGATAATAATAATAATCATGCTGTTATAAATACA" \ + "GCTCAAACAAATAAAGGTAACTTAAACATACTCATACCAGGTGTTCGCAT" \ + "ATTAATAACAGTAACAATAAAATTTATTGAACCTAATATTGATGATATAC" \ + "CAGCTAAATGTAAACTAAATATTGCACATTCTATTGAACCTCCTGAATGT" \ + "GAAAATATACCAGATAATGGTGGATAAACAGTTCAACCTGTACCTGCCCC" \ + "CATCTCGACTACAGATGATCAAATTAATAAAAAAAATGATGGTACTAATA" \ + "ATCAAAAACTTATATTATTTAATCTTGGGAATGCCATATCAGGAGCTCCT" \ + "ATCATTAAAGGTAAAAATCAATTACCAAAACCACCCATTAATGCAGGCAT" \ + "AACCATAAAAAATATCATTATTAAAGCATGTGCTGTTATTAACACATTAT" \ + "ATGCTTGATGATTGTAATTTAATATTACTGCACCAGCATCTGATAATTCT" \ + "ATACGTATTAATATAGATCAAAATGTTCCTATTAAACCTGCTAAAAATGC" \ + "AAATATTAAATATAATGTTCCAATATCTTTATGATTTGTTGACCAAGGGC" \ + "GAATTCCAGCACACTGGCGGCCGTTACTAG" #result_handle = qblast('blastn', 'nr', seq_string, format_type='HTML') #output_handle = open("test.html", "w") #output_handle.write(result_handle.read()) #output_handle.close() result_handle = qblast('blastn', 'nr', seq_string, format_type='Text') output_handle = open("test.txt", "w") output_handle.write(result_handle.read()) output_handle.close() #result_handle = qblast('blastn', 'nr', seq_string, format_type='XML') #output_handle = open("test.xml", "w") #output_handle.write(result_handle.read()) #output_handle.close() print "Done" The top hits from the script were: gb|AY916130.1| Epidermophyton floccosum mitochondrion, complete gb|EF180206.1| Penicillium confertum voucher 171.87 cytochrom... gb|EF180399.1| Penicillium soppii voucher IBT 14908 cytochrom... gb|EF180398.1| Penicillium soppii voucher IBT 3331 cytochrome... gb|EF180397.1| Penicillium soppii voucher IBT 18220 cytochrom... gb|EF180396.1| Penicillium soppii voucher 226.28 cytochrome o... gb|EF180395.1| Penicillium soppii voucher 144.83 cytochrome o... The top hits for me using online nblast for the same sequence also on the nr database: gb|AY129164.1| Pythium aphanidermatum cytochrome oxidase I ge... gb|AY561976.1| Scopalina ruetzleri cytochrome oxidase subunit... gb|EF468468.1| Phytophthora sp. H-6/02 cytochrome oxidase sub... gb|DQ832717.1| Phytophthora sojae mitochondrion, complete genome gb|EF468470.1| Phytophthora sp. H-8/02 cytochrome oxidase sub... gb|EF468469.1| Phytophthora sp. H-7/02 cytochrome oxidase sub... gb|AY129166.1| Phytophthora capsici cytochrome oxidase I gene... i.e. Very different! I switched to using plain text output as its easier to read by hand. Both correctly understood the input query was 780 letters long. Both claimed to be output from BLASTN 2.2.17 Both claimed to be output from the same database There where some differences in the parameters footer - but I'm not sure why. Using the script: Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Aug 16, 2007 6:06 PM Number of letters in database: -51,729,944 Number of sequences in database: 5,751,035 Lambda K H 1.37 0.711 1.31 Gapped Lambda K H 1.37 0.711 1.31 Matrix: blastn matrix:1 -3 Gap Penalties: Existence: 5, Extension: 2 ... While using the web browser: Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) Posted date: Aug 16, 2007 6:06 PM Number of letters in database: -51,729,944 Number of sequences in database: 5,751,035 Lambda K H 0.634 0.408 0.912 Gapped Lambda K H 0.634 0.408 0.912 Matrix: blastn matrix:2 -3 Gap Penalties: Existence: 5, Extension: 2 ... There is something funny here... does this throw any light on things? Peter From biopython at maubp.freeserve.co.uk Sat Aug 18 21:47:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 18 Aug 2007 22:47:31 +0100 Subject: [BioPython] qblast In-Reply-To: <46C766EC.6030601@maubp.freeserve.co.uk> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> <46C766EC.6030601@maubp.freeserve.co.uk> Message-ID: <46C768F3.7020608@maubp.freeserve.co.uk> Peter wrote: > I switched to using plain text output as its easier to read by hand. > > Both correctly understood the input query was 780 letters long. > Both claimed to be output from BLASTN 2.2.17 > Both claimed to be output from the same database > > There where some differences in the parameters footer - but I'm not sure > why. Using the script: > > Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, > GSS,environmental > samples or phase 0, 1 or 2 HTGS sequences) > Posted date: Aug 16, 2007 6:06 PM > Number of letters in database: -51,729,944 > Number of sequences in database: 5,751,035 > Lambda K H > 1.37 0.711 1.31 > Gapped > Lambda K H > 1.37 0.711 1.31 > Matrix: blastn matrix:1 -3 > Gap Penalties: Existence: 5, Extension: 2 > ... > > While using the web browser: > > Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, > GSS,environmental > samples or phase 0, 1 or 2 HTGS sequences) > Posted date: Aug 16, 2007 6:06 PM > Number of letters in database: -51,729,944 > Number of sequences in database: 5,751,035 > Lambda K H > 0.634 0.408 0.912 > Gapped > Lambda K H > 0.634 0.408 0.912 > Matrix: blastn matrix:2 -3 > Gap Penalties: Existence: 5, Extension: 2 > ... Solved: Using qblast in a script defaulted to blastn matrix:1 -3, while the new webpage instead defaults to blastn matrix:2 -3 You can check this (and change it) on the webpage by clicking on "Algorithm parameters" and under "Scoring Parameters" changing "Match/Mismatch Scores" from "2, -3" to "1, -3". Then the results seem to agree. Does that work for you? Peter From Michael.Robeson at Colorado.EDU Sun Aug 19 00:13:10 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Sat, 18 Aug 2007 18:13:10 -0600 Subject: [BioPython] qblast In-Reply-To: <46C76ABC.9040608@maubp.freeserve.co.uk> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> <46C76ABC.9040608@maubp.freeserve.co.uk> Message-ID: > > Assuming you are using Biopython 1.43, then I would have suggested > using Bio.SeqIO rather than Bio.Fasta for reading/writing FASTA > files. See the tutorial or http://www.biopython.org/wiki/SeqIO Okay will do! > I'm also unclear why you want to save individual output blast HTML > pages (one file per sequence). Trying to parse these HTML files > later will be a pain - go from the XML output if you want to read > the blast output with Biopython. Actually I have another script which actually parses the blast and genbank data but this was one of the first scripts I wrote to make sure I can to the 'simple' things first. But I essectially use the same code to run blast. > > See my other email - they should be the same if all the parameters > are the same (specifically it looks like the NCBI seems to be using > different defaults for the gap penalties, which was very confusing). Yes it is. I must have thought I changed those parameters when in fact I didn't. Thanks for checking on this! > > I also tried to change my code with some of the new >> biopython code. But it is very hard to find good up-to-date >> tutorials. Last time I looked none of them worked. > > Can you point out any out of date documentation please? > As far as I know, the BLAST examples in our tutorial work fine: > http://biopython.org/DIST/docs/tutorial/Tutorial.html Okay, I must have been looking in the wrong places then. I'll go through the listed tutorials again. Is there a particular one you'd suggest? I have only looked at a few of the 'Online course notes' which is probably why. I just noticed the tutorial cookbook links. I must be blind! And I am using the latest biopython (1.43) package as installed via fink for OS X. Again, thanks! I'll play around with this more over the next day or so. I'll see if I can get my code to switch settings and see how the results are affected. -Mike From biopython at maubp.freeserve.co.uk Sun Aug 19 12:36:42 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 19 Aug 2007 13:36:42 +0100 Subject: [BioPython] qblast + documentation In-Reply-To: References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> <46C76ABC.9040608@maubp.freeserve.co.uk> Message-ID: <46C8395A.3040301@maubp.freeserve.co.uk> Michael S. Robeson wrote: >>> I also tried to change my code with some of the new >>> biopython code. But it is very hard to find good up-to-date >>> tutorials. Last time I looked none of them worked. >> >> Can you point out any out of date documentation please? >> As far as I know, the BLAST examples in our tutorial work fine: >> http://biopython.org/DIST/docs/tutorial/Tutorial.html > > Okay, I must have been looking in the wrong places then. I'll go > through the listed tutorials again. Is there a particular one you'd > suggest? The main tutorial is probably best, and most up to date. http://biopython.org/DIST/docs/tutorial/Tutorial.html If you are interested in RPS-BLAST, then have a look here too: http://www.warwick.ac.uk/go/peter_cock/python/rpsblast/ > I have only looked at a few of the 'Online course notes' > which is probably why. I just noticed the tutorial cookbook links. I > must be blind! There have been changes in how we handle parsing Blast output in recent releases of Biopython (driven by changes the NCBI made). We are are now recommending the use of XML output, but the Biopython code used to do this had to change a little compared to older releases. This may mean that any "third party" tutorials are a little out of date now. Peter From meesters at uni-mainz.de Mon Aug 20 15:15:39 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 20 Aug 2007 17:15:39 +0200 Subject: [BioPython] deleting and inserting 'chains' in Bio.PDB Message-ID: <1187622939.19858.18.camel@cmeesters> Hi, I'd like delete and insert subunits in my structure object. But the following code does not work: >>> parser = PDBParser() >>> s = parser.get_structure('name','name.pdb') >>> cs = [chain for chain in s.get_chains()] >>> cs [, , , , , ] >>> c = cs[0] >>> c >>> c.id 'A' >>> c.get_id() 'A' >>> s.detach_child('A') Traceback (most recent call last): File "", line 1, in File "/var/lib/python-support/python2.5/Bio/PDB/Entity.py", line 70, in detach_child child=self.child_dict[id] KeyError: 'A' >>> s.detach_child("") Traceback (most recent call last): File "", line 1, in File "/var/lib/python-support/python2.5/Bio/PDB/Entity.py", line 70, in detach_child child=self.child_dict[id] KeyError: '' Can somebody give me a hint of what I'm missing? Thanks, Christian From thamelry at binf.ku.dk Mon Aug 20 15:51:51 2007 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Mon, 20 Aug 2007 17:51:51 +0200 Subject: [BioPython] deleting and inserting 'chains' in Bio.PDB In-Reply-To: <1187622939.19858.18.camel@cmeesters> References: <1187622939.19858.18.camel@cmeesters> Message-ID: <2d7c25310708200851v5732e2adpf1fdc49baed0cc09@mail.gmail.com> On 8/20/07, Christian Meesters wrote: > Hi, > > I'd like delete and insert subunits in my structure object. But the > following code does not work: A chain object should be removed at the level of the model object, not at the level of the structure object. Remember: Structure<-Model<-Chain<-Residue<-Atom Cheers, -Thomas From meesters at uni-mainz.de Mon Aug 20 16:01:08 2007 From: meesters at uni-mainz.de (Christian Meesters) Date: Mon, 20 Aug 2007 18:01:08 +0200 Subject: [BioPython] deleting and inserting 'chains' in Bio.PDB In-Reply-To: <2d7c25310708200851v5732e2adpf1fdc49baed0cc09@mail.gmail.com> References: <1187622939.19858.18.camel@cmeesters> <2d7c25310708200851v5732e2adpf1fdc49baed0cc09@mail.gmail.com> Message-ID: <1187625668.19858.21.camel@cmeesters> Thanks! And thank you, Thomas, for your wonderful module! The module is actually easy to use, despite of my questions ... ;-) Cheers, Christian On Mon, 2007-08-20 at 17:51 +0200, Thomas Hamelryck wrote: > On 8/20/07, Christian Meesters wrote: > > Hi, > > > > I'd like delete and insert subunits in my structure object. But the > > following code does not work: > > A chain object should be removed at the level of the model object, not > at the level of the structure object. > > Remember: > > Structure<-Model<-Chain<-Residue<-Atom > > Cheers, > > -Thomas From jodyhey at yahoo.com Mon Aug 20 17:48:36 2007 From: jodyhey at yahoo.com (Emanuel Hey) Date: Mon, 20 Aug 2007 10:48:36 -0700 (PDT) Subject: [BioPython] Blast to genome specific databases? In-Reply-To: <46C768F3.7020608@maubp.freeserve.co.uk> Message-ID: <897102.52773.qm@web53903.mail.re2.yahoo.com> NCBI has a blast form for genome specific datbases that returns more information than does a blast to the generic nucleotide database http://www.ncbi.nlm.nih.gov/BLAST/ See e.g. for the human genome http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=9606 I've used NCBIWWW.qblast and used biopython for stand alone blasting, but otherwise am not that experience. Is there a way to script a search that generates the same results as provided by these newere genome specific interfaces? Thanks Jhey ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From cjfields at uiuc.edu Mon Aug 20 19:17:48 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 20 Aug 2007 14:17:48 -0500 Subject: [BioPython] Blast to genome specific databases? In-Reply-To: <897102.52773.qm@web53903.mail.re2.yahoo.com> References: <897102.52773.qm@web53903.mail.re2.yahoo.com> Message-ID: If you follow the directions on this page: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html you should be able to modify the URLAPI DATABASE parameter to that in the list; another option would be to limit the results to a specific taxid (like 9606) using the ENTREZ_QUERY. Not sure how that would be done for biopython (being a bioperler myself) but I'm sure someone could chip in? chris On Aug 20, 2007, at 12:48 PM, Emanuel Hey wrote: > NCBI has a blast form for genome specific datbases > that returns more information than does a blast to the > generic nucleotide database > > http://www.ncbi.nlm.nih.gov/BLAST/ > > See e.g. for the human genome > > http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi? > taxid=9606 > > > I've used NCBIWWW.qblast and used biopython for stand > alone blasting, but otherwise am not that experience. > > > Is there a way to script a search that generates the > same results as provided by these newere genome > specific interfaces? > > Thanks > > Jhey > > > > > ______________________________________________________________________ > ______________ > Got a little couch potato? > Check out fun summer activities for kids. > http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities > +for+kids&cs=bz > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From biopython at maubp.freeserve.co.uk Wed Aug 22 15:05:31 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Aug 2007 16:05:31 +0100 Subject: [BioPython] Making the Seq object act more like a string Message-ID: <46CC50BB.1090902@maubp.freeserve.co.uk> Dear Biopython users, A couple of times (on bugs or the developers mailing list), Michiel de Hoon has previously suggested we could make the Seq class (Bio.Seq.Seq) a subclass of python string. I agree with him - the Seq object should act more like a string. I'm posting on the main discussion list to get some feedback on this, as any sudden changes could affect lots of people. As a simple example, although there are functions in Biopython to calculate GC percentages, for a beginner playing with Seq objects, it would be nice to be able to do things like this (where s is a Seq object): print float(s.count("G") + s.count("C")) / len(s) rather for example this: print float(s.tostring().count("G") + s.tostring().count("C")) / len(s) I also don't think the current behaviour of str(seq) is helpful. To recap, here is a simple example using a simple string: >>> ss = 'ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA' >>> print repr(ss) 'ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA' >>> print str(ss) ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA >>> print ss ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA And the equivalent using a Seq object as of Biopython 1.43: >>> s = Seq("ACAGGTACGATCGATCCTGTTGTACGTGCTC"\ ... +"GTCGACTGCTAGCTCGTCGTGGTCCGATGA") >>> print repr(s) Seq('ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA', Alphabet()) >>> print s.tostring() ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATGA >>> print str(s) Seq('ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATG ...', Alphabet()) >>> print s Seq('ACAGGTACGATCGATCCTGTTGTACGTGCTCGTCGACTGCTAGCTCGTCGTGGTCCGATG ...', Alphabet()) Note that currently doing str() on a Seq object gives a truncated version of what repr() gives. The only nice things about this is when working at the python command line, doing "print s" will only take one line even when working with a genome. And it makes it really clear you don't just have a string object. That was the motivation/background part. Feel free to chime in :) ----------------------------------------------------------------------- This next bit of the email gets a bit more technical... and might be better off on the developers mailing list. We'll see where any discussion goes. If we can agree to that making Seq inherit from the basic string is a good idea, then I would advocate a gradual transition... my thoughts are that for the next release of Biopython we: (1) Modify Seq .__str__() method to act like the existing .tostring(), i.e. return self.data I don't think changing the __str__ method will break any serious code, because as shown above, currently its like a truncated version of __repr__ so all its useful for at the moment is getting a truncated sequence for display. (2) Consider adding alphabet aware versions selected string methods to the Seq object (e.g. count, find) Adding new methods to the Seq class should have no effect on existing usage. Then, for the release afterwards: (3) actually do the class inheritance with all the horrors entailed. As part of this, we'll need to address how the __eq__ method of the Seq object should act: Looking at the sequence only, or considering the alphabet too? Currently this method is not implemented at all. This is part of a larger question - how to cope with multiple Seq/string operations where there is more than one alphabet. e.g. comparing/adding/joining a nucleotide Seq to a protein Seq object. I would opt for the simple solution that the alphabets must match or some sort of ValueError is raised. Alternatively, as the alphabets have a class hierarchy, we could choose the parent alphabet (e.g. the generic single letter alphabet when dealing with a DNA and Protein; or the generic single nucleotide alphabet when dealing with RNA and DNA). Any thoughts? Peter [In fact, Michiel has also suggested making the SeqRecord class a subclass of the Seq class, which raises even more questions] From sbassi at gmail.com Wed Aug 22 15:41:53 2007 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 22 Aug 2007 12:41:53 -0300 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <46CC50BB.1090902@maubp.freeserve.co.uk> References: <46CC50BB.1090902@maubp.freeserve.co.uk> Message-ID: On 8/22/07, Peter wrote: > A couple of times (on bugs or the developers mailing list), Michiel de > Hoon has previously suggested we could make the Seq class (Bio.Seq.Seq) > a subclass of python string. I agree with him - the Seq object should > act more like a string. I agree. Seq acting more like a str would also lower the entry level to use biopython for non OOP seasoned programmers. > As a simple example, although there are functions in Biopython to ... Here is another example (from http://www.biopython.org/wiki/SeqIO#Using_the_SEGUID_checksum): Current situation: from Bio import SeqIO from Bio.SeqUtils.CheckSum import seguid seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"), lambda rec : seguid(rec.seq)) record = seguid_dict["MN/s0q9zDoCVEEc+k/IFwCNF2pY"] print record.id print record.description If seq were more like a string: ... seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"), seguid(rec.seq)) .... This way you avoid using a lambda. -- Bioinformatics news: http://www.bioinformatica.info Lriser: http://www.linspire.com/lraiser_success.php?serial=318 From biopython at maubp.freeserve.co.uk Wed Aug 22 15:53:59 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Aug 2007 16:53:59 +0100 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: References: <46CC50BB.1090902@maubp.freeserve.co.uk> Message-ID: <46CC5C17.4000709@maubp.freeserve.co.uk> Sebastian Bassi wrote: > On 8/22/07, Peter wrote: >> A couple of times (on bugs or the developers mailing list), Michiel de >> Hoon has previously suggested we could make the Seq class (Bio.Seq.Seq) >> a subclass of python string. I agree with him - the Seq object should >> act more like a string. > > I agree. Seq acting more like a str would also lower the entry level > to use biopython for non OOP seasoned programmers. Good :) >> As a simple example, although there are functions in Biopython to > ... > > Here is another example (from > http://www.biopython.org/wiki/SeqIO#Using_the_SEGUID_checksum ): I added that to the wiki recently - although it is perhaps premature given your CheckSum code hasn't been officially release yet. This was my draft, moving/adding it to the tutorial is on my to do list. > Current situation: > > from Bio import SeqIO > from Bio.SeqUtils.CheckSum import seguid > seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"), > lambda rec : seguid(rec.seq)) > record = seguid_dict["MN/s0q9zDoCVEEc+k/IFwCNF2pY"] > print record.id > print record.description > > If seq were more like a string: > > ... > seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), "genbank"), > seguid(rec.seq)) Nope ;) You have to give a function to the key_function argument in SeqIO.to_dict(), and in your example seguid(rec.seq) would be a string (the result of the seguid function acting on a seq object). Or at least, it would if you had a rec variable in scope. However, if SeqRecord acted more like a Seq (and therefore more like a string) then you could do this which does avoid the lambda: seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), \ "genbank"), seguid) Or, we could enhance your the CheckSum functions to cope with a SeqRecord, a Seq or a string - right now they cope with a Seq or a string. Peter From mdehoon at c2b2.columbia.edu Thu Aug 23 06:06:27 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 23 Aug 2007 02:06:27 -0400 Subject: [BioPython] Making the Seq object act more like a string References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> Peter wrote: > However, if SeqRecord acted more like a Seq (and therefore more like a > string) then you could do this which does avoid the lambda: > > seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), \ > "genbank"), seguid) > > Or, we could enhance your the CheckSum functions to cope with a > SeqRecord, a Seq or a string - right now they cope with a Seq or a string. > Nice example. The SeqRecord class is one of those classes in Biopython for which I never understood why they exist. A SeqRecord is nothing more than a Seq with some attributes attached. If we add those attributes to the Seq class directly, then we can get rid of the SeqRecord class. Then, functions such as those in CheckSum only need to cope with strings. --Micheil. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From mdehoon at c2b2.columbia.edu Thu Aug 23 06:06:27 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 23 Aug 2007 02:06:27 -0400 Subject: [BioPython] Making the Seq object act more like a string References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> Peter wrote: > However, if SeqRecord acted more like a Seq (and therefore more like a > string) then you could do this which does avoid the lambda: > > seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), \ > "genbank"), seguid) > > Or, we could enhance your the CheckSum functions to cope with a > SeqRecord, a Seq or a string - right now they cope with a Seq or a string. > Nice example. The SeqRecord class is one of those classes in Biopython for which I never understood why they exist. A SeqRecord is nothing more than a Seq with some attributes attached. If we add those attributes to the Seq class directly, then we can get rid of the SeqRecord class. Then, functions such as those in CheckSum only need to cope with strings. --Micheil. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3293 bytes Desc: not available URL: From Michael.Robeson at Colorado.EDU Thu Aug 23 20:13:17 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Thu, 23 Aug 2007 14:13:17 -0600 Subject: [BioPython] qblast + documentation In-Reply-To: <46C8395A.3040301@maubp.freeserve.co.uk> References: <6FD85196-34A9-4641-B454-DBA9A589679F@colorado.edu> <46C747BC.6010002@maubp.freeserve.co.uk> <46C76ABC.9040608@maubp.freeserve.co.uk> <46C8395A.3040301@maubp.freeserve.co.uk> Message-ID: <2EEE2FCD-6A73-408D-9984-D3806D717BD1@colorado.edu> Why is there no way to add other parameters like gap penalties to qblast? Like can be done on the web site? Am I missing something? Is there code I can add? -Thanks -Mike From Michael.Robeson at Colorado.EDU Thu Aug 23 20:20:26 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Thu, 23 Aug 2007 14:20:26 -0600 Subject: [BioPython] qblast + documentation Message-ID: I meant to add to the other e-mail: Since we know it has to do with the blast setting differences on the NCBI web site versus the biopython back-end. How can I get around that? I've tried a few things, but have not been successful. -Mike From mdehoon at c2b2.columbia.edu Thu Aug 23 23:01:17 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Fri, 24 Aug 2007 08:01:17 +0900 Subject: [BioPython] qblast + documentation In-Reply-To: References: Message-ID: <46CE11BD.2020307@c2b2.columbia.edu> Have you looked at the qblast function definition in http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIWWW.py?rev=1.46&cvsroot=biopython&content-type=text/vnd.viewcvs-markup I believe all the Blast parameters are there. --Michiel. Michael S. Robeson wrote: > I meant to add to the other e-mail: Since we know it has to do with > the blast setting differences on the NCBI web site versus the > biopython back-end. How can I get around that? I've tried a few > things, but have not been successful. > > -Mike > _______________________________________________ > BioPython mailing list - BioPython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From Michael.Robeson at Colorado.EDU Fri Aug 24 01:12:30 2007 From: Michael.Robeson at Colorado.EDU (Michael S. Robeson) Date: Thu, 23 Aug 2007 19:12:30 -0600 Subject: [BioPython] qblast + documentation In-Reply-To: <46CDFFE8.2040900@maubp.freeserve.co.uk> References: <46CDFFE8.2040900@maubp.freeserve.co.uk> Message-ID: <8963513B-811D-458B-BAB8-1BE64D3CA76C@colorado.edu> > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/ > Blast/NCBIWWW.py?rev=1.46&cvsroot=biopython&content-type=text/ > vnd.viewcvs-markup Are those new parameters? They do exist in the code I downloaded? Guess I will download again from fink. > If you are familiar with the BLAST terminology this is probably > very obvious, but you need to set the NUCL_REWARD and NUCL_PENALTY > options in the URL. Yeah, I did not mean to say it was biopython's fault, but the version of the code I have had no parameters under qblast for me to alter after we new there was a difference with the web interface end at NCBI. The version I have (latest via fink, biopython version: 1.43-1001) showed this: def qblast(program, database, sequence, ncbi_gi=None, descriptions=None, alignments=None, expect=None, matrix=None, filter=None, format_type="XML", hitlist_size=None, entrez_query='(none)', ): and not: def qblast(program, database, sequence, auto_format=None,composition_based_statistics=None, db_genetic_code=None,endpoints=None,entrez_query='(none)', expect=10.0,filter=None,gapcosts=None,genetic_code=None, hitlist_size=50,i_thresh=None,layout=None,lcase_mask=None, matrix_name=None,nucl_penalty=None,nucl_reward=None, other_advanced=None,perc_ident=None,phi_pattern=None, query_file=None,query_believe_defline=None,query_from=None, query_to=None,searchsp_eff=None,service=None,threshold=None, ungapped_alignment=None,word_size=None, alignments=500,alignment_view=None,descriptions=500, entrez_links_new_window=None,expect_low=None,expect_high=None, format_entrez_query=None,format_object=None,format_type='XML', ncbi_gi=None,results_file=None,show_overview=None ): So, there must be a problem with my fink installation or something. I will try and uninstall and re-install biopython via fink. -Mike From mdehoon at c2b2.columbia.edu Fri Aug 24 02:08:29 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Thu, 23 Aug 2007 22:08:29 -0400 Subject: [BioPython] qblast + documentation References: <46CDFFE8.2040900@maubp.freeserve.co.uk> <8963513B-811D-458B-BAB8-1BE64D3CA76C@colorado.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B60B@mail2.exch.c2b2.columbia.edu> > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/ > > Blast/NCBIWWW.py?rev=1.46&cvsroot=biopython&content-type=text/ > > vnd.viewcvs-markup > Are those new parameters? They do exist in the code I downloaded? > Guess I will download again from fink. It looks like these are coming from a recent improvement of Biopython. So you won't find this in Biopython release 1.43. You can just take the new NCBIWWW.py from CVS and copy it over the NCBIWWW.py in Biopython 1.43. Maybe this is a good time for a new Biopython release? --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From biopython at maubp.freeserve.co.uk Fri Aug 24 09:35:16 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 24 Aug 2007 10:35:16 +0100 Subject: [BioPython] [Fwd: Re: qblast + documentation] Message-ID: <46CEA654.1070706@maubp.freeserve.co.uk> I managed to send this to Michael Robeson only, and not the list. In the mean time Michiel de Hoon pointed out that support for lots of parameters (including NUCL_REWARD and NUCL_PENALTY) was added after the release of Biopython 1.43, so you would have to update the file Bio/Blast/NCBIWWW.py If you don't want to bother with CVS, the easy way to do this is backup the original and then replace it with the download from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/*checkout*/biopython/Bio/Blast/NCBIWWW.py?rev=HEAD&cvsroot=biopython&content-type=text/x-python Looking at the history, that shouldn't impact anything else. Peter -------- Original Message -------- Subject: Re: [BioPython] qblast + documentation Date: Thu, 23 Aug 2007 22:45:12 +0100 From: Peter Reply-To: biopython at lists.open-bio.org To: Michael S. Robeson References: Michael S. Robeson wrote: > I meant to add to the other e-mail: Since we know it has to do with > the blast setting differences on the NCBI web site versus the > biopython back-end. How can I get around that? I've tried a few > things, but have not been successful. I'd just like to point out this isn't Biopython's fault - its the NCBI who seem to be using different match/mismatch penalties on their GUI webserver and their QBLAST web API. The simple answer for how to get the two queries to agree, is whenever you do any manual queries on the website, click on "Algorithm parameters" and under "Scoring Parameters" change "Match/Mismatch Scores" from "2, -3" to "1, -3" (grin). Or, if you want to change the gap penalties in the qblast call, what Biopython is doing is providing a python interface to this URL scheme: http://www.ncbi.nlm.nih.gov/BLAST/Doc/node5.html If you are familiar with the BLAST terminology this is probably very obvious, but you need to set the NUCL_REWARD and NUCL_PENALTY options in the URL. i.e. In Biopython use the optional nucl_reward and nucl_penalty arguments to the Bio.Blast.NCBIWWW.qblast function. from Bio.Blast.NCBIWWW import qblast seq_string = "TGTGATGGATATCTGCAGAATTCGCCCTTTAAACTTCAGGGTGACCAAAA" \ + "AATCAAAATAAATGTTGAAATAATACTGGATCTCCACCACCACTAACTTC" \ + "AAAAAATGTTGTATTAAAATTTCTATCAGTTAATAACATTGTTATAGCAC" \ + "CCCCTAATACTGGTAATGATAATAATAATAATCATGCTGTTATAAATACA" \ + "GCTCAAACAAATAAAGGTAACTTAAACATACTCATACCAGGTGTTCGCAT" \ + "ATTAATAACAGTAACAATAAAATTTATTGAACCTAATATTGATGATATAC" \ + "CAGCTAAATGTAAACTAAATATTGCACATTCTATTGAACCTCCTGAATGT" \ + "GAAAATATACCAGATAATGGTGGATAAACAGTTCAACCTGTACCTGCCCC" \ + "CATCTCGACTACAGATGATCAAATTAATAAAAAAAATGATGGTACTAATA" \ + "ATCAAAAACTTATATTATTTAATCTTGGGAATGCCATATCAGGAGCTCCT" \ + "ATCATTAAAGGTAAAAATCAATTACCAAAACCACCCATTAATGCAGGCAT" \ + "AACCATAAAAAATATCATTATTAAAGCATGTGCTGTTATTAACACATTAT" \ + "ATGCTTGATGATTGTAATTTAATATTACTGCACCAGCATCTGATAATTCT" \ + "ATACGTATTAATATAGATCAAAATGTTCCTATTAAACCTGCTAAAAATGC" \ + "AAATATTAAATATAATGTTCCAATATCTTTATGATTTGTTGACCAAGGGC" \ + "GAATTCCAGCACACTGGCGGCCGTTACTAG" #filename, format = "test.html", "HTML" #filename, format = "test.txt", "Text" filename, format = "test.xml", "XML" result_handle = qblast('blastn', 'nr', seq_string, format_type=format, nucl_reward=2, nucl_penalty=-3) output_handle = open(filename, "w") output_handle.write(result_handle.read()) output_handle.close() print "Done" And this does seem to agree with the results from doing the query by hand on their website with the default settings. Peter From biopython at maubp.freeserve.co.uk Mon Aug 27 18:48:55 2007 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 27 Aug 2007 19:48:55 +0100 Subject: [BioPython] Making the Seq object act more like a string In-Reply-To: <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> References: <46CC50BB.1090902@maubp.freeserve.co.uk> <46CC5C17.4000709@maubp.freeserve.co.uk> <6243BAA9F5E0D24DA41B27997D1FD14402B609@mail2.exch.c2b2.columbia.edu> Message-ID: <46D31C97.1070200@maubp.freeserve.co.uk> Michiel De Hoon wrote: > Peter wrote: >> However, if SeqRecord acted more like a Seq (and therefore more like a >> string) then you could do this which does avoid the lambda: >> >> seguid_dict = SeqIO.to_dict(SeqIO.parse(open("ls_orchid.gbk"), \ >> "genbank"), seguid) >> >> Or, we could enhance your the CheckSum functions to cope with a >> SeqRecord, a Seq or a string - right now they cope with a Seq or a string. >> > Nice example. > > The SeqRecord class is one of those classes in Biopython for which I never > understood why they exist. A SeqRecord is nothing more than a Seq with some > attributes attached. If we add those attributes to the Seq class directly, > then we can get rid of the SeqRecord class. Then, functions such as those in > CheckSum only need to cope with strings. > > --Micheil. I think having SeqRecord subclass Seq is nicer than simply adding annotation to the Seq class. Seq objects would (still) just have a sequence and alphabet, the SeqRecord becomes a rich/annotated Seq object. I think this would be close to BioPerl's Seq and RichSeq objects. I have filed an enhancement on Bugzilla to hold any suggested patches etc (I hope to upload something later tonight): Bug 2351 - Make SeqRecord subclass Seq subclass string? http://bugzilla.open-bio.org/show_bug.cgi?id=2351 Peter From mdehoon at c2b2.columbia.edu Thu Aug 30 02:20:01 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 29 Aug 2007 22:20:01 -0400 Subject: [BioPython] Bio.MarkupEditor Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B60D@mail2.exch.c2b2.columbia.edu> Does anybody use this module? It is not used anywhere in Biopython. If it is not being used, I suggest to deprecate it. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From jimmy.musselwhite at gmail.com Thu Aug 30 03:46:51 2007 From: jimmy.musselwhite at gmail.com (Jimmy Musselwhite) Date: Wed, 29 Aug 2007 23:46:51 -0400 Subject: [BioPython] Cluster problem keeping me from getting started Message-ID: <86e5e8970708292046m55ffa549p4544d2a1ffd2c294@mail.gmail.com> Hi everyone I'm trying to start an assignment I have using BioPython but a critical functionality is not working for me. When I do python run_tests.py It "hangs" at test_Cluster ... and if I run test_Cluster.py by itself it does not complete properly. I installed Cluster 3.0 from the bonsai website and then I even re-installed PyCluster from the source there. I don't know what else could be wrong here! I really need Cluster to work. It's pretty much the point of my assignment. Thanks! From mdehoon at c2b2.columbia.edu Thu Aug 30 03:59:51 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Wed, 29 Aug 2007 23:59:51 -0400 Subject: [BioPython] Cluster problem keeping me from getting started References: <86e5e8970708292046m55ffa549p4544d2a1ffd2c294@mail.gmail.com> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B60F@mail2.exch.c2b2.columbia.edu> This was fixed in CVS. If you pick up the new Bio.Cluster source files from here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Cluster/?c vsroot=biopython copy them over the corresponding files in the biopython-1.43 distribution, and reinstall, the test should run to completion. If it doesn't, please let me know. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 -----Original Message----- From: biopython-bounces at lists.open-bio.org on behalf of Jimmy Musselwhite Sent: Wed 8/29/2007 11:46 PM To: biopython at lists.open-bio.org Subject: [BioPython] Cluster problem keeping me from getting started Hi everyone I'm trying to start an assignment I have using BioPython but a critical functionality is not working for me. When I do python run_tests.py It "hangs" at test_Cluster ... and if I run test_Cluster.py by itself it does not complete properly. I installed Cluster 3.0 from the bonsai website and then I even re-installed PyCluster from the source there. I don't know what else could be wrong here! I really need Cluster to work. It's pretty much the point of my assignment. Thanks! _______________________________________________ BioPython mailing list - BioPython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython