From p.j.a.cock at googlemail.com Thu Dec 1 03:52:43 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Dec 2011 08:52:43 +0000 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: On Wednesday, November 30, 2011, Mic wrote: > Thank you it is working. > Excellent - sorry I couldn't think of a nice way to explain the syntax. Peter From jmtc21 at bath.ac.uk Thu Dec 1 09:09:33 2011 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Thu, 01 Dec 2011 14:09:33 +0000 Subject: [Biopython] GTF to GFF using BCBio In-Reply-To: References: Message-ID: <4ED78A9D.4080306@bath.ac.uk> Hello all, I was trying to find an example on how to get a gff3 formated file from a gtf file using the BCBio library extensions for biopython. I the doc it only covers non-gff-family formats to gff3. I guess is just a parameter thing in the GFF writer but have been looking unsuccessfully for the answer. Any hint will be greatly appreciated. Best regards, J. From jmtc21 at bath.ac.uk Thu Dec 1 11:45:40 2011 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Thu, 01 Dec 2011 16:45:40 +0000 Subject: [Biopython] GTF to GFF using BCBio In-Reply-To: <4ED78A9D.4080306@bath.ac.uk> References: <4ED78A9D.4080306@bath.ac.uk> Message-ID: <4ED7AF34.8050707@bath.ac.uk> Reply from BCBio library's author: https://github.com/chapmanb/bcbb/blob/master/gff/Scripts/gff/gff2_to_gff3.py and will cover GFF2/GTF files into GFF3. J. On 01/12/11 14:09, Jaime Tovar wrote: > Hello all, > > I was trying to find an example on how to get a gff3 formated file > from a gtf file using the BCBio library extensions for biopython. I > the doc it only covers non-gff-family formats to gff3. I guess is just > a parameter thing in the GFF writer but have been looking > unsuccessfully for the answer. Any hint will be greatly appreciated. > > Best regards, > > J. > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From daniel.svozil at vscht.cz Tue Dec 6 06:39:35 2011 From: daniel.svozil at vscht.cz (Daniel Svozil) Date: Tue, 6 Dec 2011 12:39:35 +0100 Subject: [Biopython] mmView - a tool for mmCIF exploration In-Reply-To: References: Message-ID: Dear colleagues, We would like to announce the availability of mmView - the web-based application which allows to comfortably explore the structural data of biomacromolecules stored in the mmCIF (macromolecular Crystallographic Information File) format. The mmView software system is primarily intended for educational purposes but it can also serve as an auxiliary tool for working with biomolecular structures. The mmView application is offered in two flavors: as a publicly available web server http://ich.vscht.cz/projects/mmview/, and as an open-source stand-alone application (available from http://sourceforge.net/projects/mmview) that can be installed on the user?s computer. Petr Cech and Daniel Svozil -- Daniel Svozil, PhD Head of Laboratory of Informatics and Chemistry Institute of Chemical Technology Czech Republic phone: +420 220 444 391 http://ich.vscht.cz/~svozil From mictadlo at gmail.com Tue Dec 6 23:41:25 2011 From: mictadlo at gmail.com (Mic) Date: Wed, 7 Dec 2011 14:41:25 +1000 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: No worries is was perfect. I have the following code and I do not know how to combine the *header* and *seq* variables from the '*with*' statement with generator expression? from Bio import SeqIO from Bio.SeqRecord import SeqRecord from Bio.Seq import Seq from pprint import pprint if __name__ == '__main__': *with* open('input.txt') as f: for line in f: try: splited_line = line.split('\t') *header* = splited_line[0] +'_'+ splited_line[2] *seq* = splited_line[3] except IndexError: continue fasta_file = open('output.fasta', 'w') records = (SeqRecord(???), id=????, description="") for i in ???) SeqIO.write(records, fasta_file, "fasta") Thank you in advance. On Thu, Dec 1, 2011 at 6:52 PM, Peter Cock wrote: > > > On Wednesday, November 30, 2011, Mic wrote: > > Thank you it is working. > > > > Excellent - sorry I couldn't think of a nice way to explain the syntax. > > Peter From p.j.a.cock at googlemail.com Wed Dec 7 03:26:08 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 7 Dec 2011 08:26:08 +0000 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: On Wed, Dec 7, 2011 at 4:41 AM, Mic wrote: > No worries is was perfect. > > I have the following code and I do not know how to combine the *header* and > *seq* variables from the '*with*' statement with generator expression? > > from Bio import SeqIO > from Bio.SeqRecord import SeqRecord > from Bio.Seq import Seq > from pprint import pprint > > if __name__ == '__main__': > > ? ?*with* open('input.txt') as f: > ? ? ? ?for line in f: > ? ? ? ? ? ?try: > ? ? ? ? ? ? ? ?splited_line = line.split('\t') > > ? ? ? ? ? ? ? ?*header* = splited_line[0] +'_'+ splited_line[2] > ? ? ? ? ? ? ? ?*seq* = splited_line[3] > ? ? ? ? ? ?except IndexError: > ? ? ? ? ? ? ? ?continue > > ? ?fasta_file = open('output.fasta', 'w') > ? ?records = (SeqRecord(???), id=????, description="") for i in ???) > > ? ?SeqIO.write(records, fasta_file, "fasta") > > Thank you in advance. Are you trying to parse a tabular file, with three columns (ID, sequence, description)? I suggest you learn about generator functions in Python. Peter From jordan.r.willis at Vanderbilt.Edu Wed Dec 7 03:49:06 2011 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Wed, 7 Dec 2011 02:49:06 -0600 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: <1923E9B1-AA7B-49FA-8F63-F05642E948D7@vanderbilt.edu> Does the input.txt have fasta sequences in it and you want to write them to a file? This is what it looks like. from Bio import SeqIO as sio generator = sio.read('input.txt','fasta') for i in generator: print i.header print i.seq will give you header and sequence. You could of course write this to a file, but you seem to be inputting a fasta file just to write out another one. Jordan Willis Ph.D Candidate, CPB Laboratory of Dr. James Crowe and Dr. Jens Meiler 11475 MRBIV 2213 Garland Ave. Nashville, TN 37232 Cell: 816-674-5340 Office: 615-343-8263 On Dec 7, 2011, at 12:26 AM, Peter Cock wrote: On Wed, Dec 7, 2011 at 4:41 AM, Mic > wrote: No worries is was perfect. I have the following code and I do not know how to combine the *header* and *seq* variables from the '*with*' statement with generator expression? from Bio import SeqIO from Bio.SeqRecord import SeqRecord from Bio.Seq import Seq from pprint import pprint if __name__ == '__main__': *with* open('input.txt') as f: for line in f: try: splited_line = line.split('\t') *header* = splited_line[0] +'_'+ splited_line[2] *seq* = splited_line[3] except IndexError: continue fasta_file = open('output.fasta', 'w') records = (SeqRecord(???), id=????, description="") for i in ???) SeqIO.write(records, fasta_file, "fasta") Thank you in advance. Are you trying to parse a tabular file, with three columns (ID, sequence, description)? I suggest you learn about generator functions in Python. Peter _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From mictadlo at gmail.com Wed Dec 7 08:11:24 2011 From: mictadlo at gmail.com (Mic) Date: Wed, 7 Dec 2011 23:11:24 +1000 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: <1923E9B1-AA7B-49FA-8F63-F05642E948D7@vanderbilt.edu> References: <1923E9B1-AA7B-49FA-8F63-F05642E948D7@vanderbilt.edu> Message-ID: Thank you for the solution. My input files is not exactly a FASTA file, but it contains information to build it. The file looks like this: test1\t0001\a1\tAATTCC Output should looks like: >test1_a1 AATTCC On Wed, Dec 7, 2011 at 6:49 PM, Willis, Jordan R < jordan.r.willis at vanderbilt.edu> wrote: > Does the input.txt have fasta sequences in it and you want to write them > to a file? This is what it looks like. > > from Bio import SeqIO as sio > > generator = sio.read('input.txt','fasta') > > > for i in generator: > print i.header > print i.seq > > will give you header and sequence. You could of course write this to a > file, but you seem to be inputting a fasta file just to write out another > one. > > > > > Jordan Willis > Ph.D Candidate, CPB > Laboratory of Dr. James Crowe and Dr. Jens Meiler > 11475 MRBIV > 2213 Garland Ave. > Nashville, TN 37232 > Cell: 816-674-5340 > Office: 615-343-8263 > > > On Dec 7, 2011, at 12:26 AM, Peter Cock wrote: > > On Wed, Dec 7, 2011 at 4:41 AM, Mic wrote: > > No worries is was perfect. > > > I have the following code and I do not know how to combine the *header* and > > *seq* variables from the '*with*' statement with generator expression? > > > from Bio import SeqIO > > from Bio.SeqRecord import SeqRecord > > from Bio.Seq import Seq > > from pprint import pprint > > > if __name__ == '__main__': > > > *with* open('input.txt') as f: > > for line in f: > > try: > > splited_line = line.split('\t') > > > *header* = splited_line[0] +'_'+ splited_line[2] > > *seq* = splited_line[3] > > except IndexError: > > continue > > > fasta_file = open('output.fasta', 'w') > > records = (SeqRecord(???), id=????, description="") for i in ???) > > > SeqIO.write(records, fasta_file, "fasta") > > > Thank you in advance. > > > Are you trying to parse a tabular file, with three columns > (ID, sequence, description)? > > I suggest you learn about generator functions in Python. > > Peter > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > From eric.talevich at gmail.com Wed Dec 7 11:07:57 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Dec 2011 11:07:57 -0500 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: Mic, You don't really need a generator expression here, but I recommend that you read the Python Tutorial to learn how to use them anyway. To solve your problem, here's one solution using Biopython and a list comprehension (like a generator expression, but more your pace): def row_to_seqrecord(row): """Convert a tab-delimited row to a SeqRecord. Row looks like: test1\t0001\a1\tAATTCC Record looks like (conceptually): >test1_a1 AATTCC """ cells = [cell.strip() for cell in row.split('\t')] return SeqRecord(Seq(cells[3]), id=cells[0] + '_' + cells[2]) with open('input.txt') as infile: records = [row_to_seqrecord(line) for line in infile] SeqIO.write(records, 'output.txt', 'fasta') But the nice thing about FASTA format is that there's almost no structure to it. Here's a simpler way to do it that doesn't use Biopython: with open('input.txt') as infile: with open('output.fasta', 'w+') as outfile: for line in infile: parts = [part.strip() for part in line.split('\t')] if len(parts) != 4: continue # Header outfile.write(">%s_%s\n" % (parts[0], parts[2]) # Sequence outfile.write(parts[3] + '\n') On Tue, Dec 6, 2011 at 11:41 PM, Mic wrote: > No worries is was perfect. > > I have the following code and I do not know how to combine the *header* and > *seq* variables from the '*with*' statement with generator expression? > > from Bio import SeqIO > from Bio.SeqRecord import SeqRecord > from Bio.Seq import Seq > from pprint import pprint > > if __name__ == '__main__': > > *with* open('input.txt') as f: > for line in f: > try: > splited_line = line.split('\t') > > *header* = splited_line[0] +'_'+ splited_line[2] > *seq* = splited_line[3] > except IndexError: > continue > > fasta_file = open('output.fasta', 'w') > records = (SeqRecord(???), id=????, description="") for i in ???) > > SeqIO.write(records, fasta_file, "fasta") > > Thank you in advance. > > On Thu, Dec 1, 2011 at 6:52 PM, Peter Cock >wrote: > > > > > > > On Wednesday, November 30, 2011, Mic wrote: > > > Thank you it is working. > > > > > > > Excellent - sorry I couldn't think of a nice way to explain the syntax. > > > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From jsanders at oeb.harvard.edu Fri Dec 9 16:53:12 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Fri, 9 Dec 2011 16:53:12 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? Message-ID: So I have two problems. Problem 1: when importing my newick-formatted trees, which were generated in PyCogent, the terminal labels and branch labels are read in as confidence values because they're numerical. So ((((41:0.01494,44:0.00014)0.604:0... is read in with blank name='' values and 41, 44, 0.605, etc. as 'confidence' values. Problem 2: I would like to store multiple confidence values per node, but I can't figure out how to do it. I can get the plain old 'confidence' attribute set by: clade.confidence = .05 but can't figure out how to add and set new confidence types. Any suggestions? Much appreciated, -jon -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From eric.talevich at gmail.com Fri Dec 9 18:26:46 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Dec 2011 18:26:46 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? In-Reply-To: References: Message-ID: Hi Jon, On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders wrote: > So I have two problems. > > > Problem 1: when importing my newick-formatted trees, which were generated > in PyCogent, the terminal labels and branch labels are read in as > confidence values because they're numerical. So > > ((((41:0.01494,44:0.00014)0.604:0... > > is read in with blank name='' values and 41, 44, 0.605, etc. as > 'confidence' values. > Hmm, I'll take a look at the Newick parser. I think I've used numeric taxon labels before without a problem, but PyCogent wasn't involved. It might work if you can coax PyCogent into writing the Newick files with an extra colon: ((((:41:0.01494,:44:0.00014):0.604:0... > Problem 2: I would like to store multiple confidence values per node, but I > can't figure out how to do it. > > I can get the plain old 'confidence' attribute set by: > > clade.confidence = .05 > > but can't figure out how to add and set new confidence types. Any > suggestions? > The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence class. In PhyloXML trees, the attribute "clade.confidence" is actually a Python property pointing to the first element of "clade.confidences", a list of Confidence objects. It's syntax sugar to keep compatibility with Newick, which just has a numeric value there. You can use it like this: from Bio.Phylo import PhyloXML # Create new Confidence instances a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap") # The second argument is optional a_posterior_probability = PhyloXML.Confidence(0.99) # Select a clade from your tree to modify a_clade = mytree.clade[...] # Modify the list of Confidences directly a_clade.confidences.append(a_bootstrap_value) a_clade.confidences.append(a_posterior_probability) If you've assigned multiple confidence values to a clade, using the PhyloXML class, then the "clade.confidence" shortcut won't work anymore because it's not clear which confidence you mean. So you'll have to use e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in PhyloXML format to preserve the extra data. Hope that helps. Best regards, Eric From jsanders at oeb.harvard.edu Tue Dec 13 13:17:09 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Tue, 13 Dec 2011 13:17:09 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? In-Reply-To: References: Message-ID: Thanks Eric! I got the hang of the PhyloXML confidence objects now, so that's straightened out. Still having issues with the tree parsing. I tried throwing in extra colons with a regex, both before and after the tip/edge label, but that didn't change the behavior of the parser, and all the tip/edge labels were still imported as confidence values. Poking around some documentation on the newick format, it seems like the edge labels might be tricking the parser into thinking there are confidence values present, since there's no clear way to distinguish between them. I'll try playing around with supressing the edge labels in PyCogent and see if I can't pass a decent tree to BioPython side for proper PhyloXML output. Ugh. -j On Fri, Dec 9, 2011 at 6:26 PM, Eric Talevich wrote: > Hi Jon, > > On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders wrote: > >> So I have two problems. >> >> >> Problem 1: when importing my newick-formatted trees, which were generated >> in PyCogent, the terminal labels and branch labels are read in as >> confidence values because they're numerical. So >> >> ((((41:0.01494,44:0.00014)0.604:0... >> >> is read in with blank name='' values and 41, 44, 0.605, etc. as >> 'confidence' values. >> > > Hmm, I'll take a look at the Newick parser. I think I've used numeric > taxon labels before without a problem, but PyCogent wasn't involved. > > It might work if you can coax PyCogent into writing the Newick files with > an extra colon: > ((((:41:0.01494,:44:0.00014):0.604:0... > > > >> Problem 2: I would like to store multiple confidence values per node, but >> I >> can't figure out how to do it. >> >> I can get the plain old 'confidence' attribute set by: >> >> clade.confidence = .05 >> >> but can't figure out how to add and set new confidence types. Any >> suggestions? >> > > The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence > class. > > In PhyloXML trees, the attribute "clade.confidence" is actually a Python > property pointing to the first element of "clade.confidences", a list of > Confidence objects. It's syntax sugar to keep compatibility with Newick, > which just has a numeric value there. > > You can use it like this: > > from Bio.Phylo import PhyloXML > > # Create new Confidence instances > a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap") > # The second argument is optional > a_posterior_probability = PhyloXML.Confidence(0.99) > > # Select a clade from your tree to modify > a_clade = mytree.clade[...] > > # Modify the list of Confidences directly > a_clade.confidences.append(a_bootstrap_value) > a_clade.confidences.append(a_posterior_probability) > > > If you've assigned multiple confidence values to a clade, using the > PhyloXML class, then the "clade.confidence" shortcut won't work anymore > because it's not clear which confidence you mean. So you'll have to use > e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in > PhyloXML format to preserve the extra data. > > Hope that helps. > > Best regards, > Eric > -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From jsanders at oeb.harvard.edu Tue Dec 13 13:55:20 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Tue, 13 Dec 2011 13:55:20 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? In-Reply-To: References: Message-ID: Update: yup, seems to be a problem with numeric tip names. 1) getting rid of internal edge name doesn't help 2) appending 'a' to tip names fixes it 3) this tree: (((1,2),(3,4)),5); loads the numeric tip names as branch lengths 4) this tree: (((1:0.01,2:0.01):0.01,(3:0.01,4:0.01):0.01):0.01,5:0.03); loads the numeric tip names as confidence values and branch lenghts correctly I might try poking around the parser too, although my python foo has little bar. -j On Tue, Dec 13, 2011 at 1:17 PM, Jon Sanders wrote: > Thanks Eric! I got the hang of the PhyloXML confidence objects now, so > that's straightened out. > > Still having issues with the tree parsing. I tried throwing in extra > colons with a regex, both before and after the tip/edge label, but that > didn't change the behavior of the parser, and all the tip/edge labels were > still imported as confidence values. Poking around some documentation on > the newick format, it seems like the edge labels might be tricking the > parser into thinking there are confidence values present, since there's no > clear way to distinguish between them. I'll try playing around with > supressing the edge labels in PyCogent and see if I can't pass a decent > tree to BioPython side for proper PhyloXML output. > > Ugh. > > -j > > > On Fri, Dec 9, 2011 at 6:26 PM, Eric Talevich wrote: > >> Hi Jon, >> >> On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders wrote: >> >>> So I have two problems. >>> >>> >>> Problem 1: when importing my newick-formatted trees, which were generated >>> in PyCogent, the terminal labels and branch labels are read in as >>> confidence values because they're numerical. So >>> >>> ((((41:0.01494,44:0.00014)0.604:0... >>> >>> is read in with blank name='' values and 41, 44, 0.605, etc. as >>> 'confidence' values. >>> >> >> Hmm, I'll take a look at the Newick parser. I think I've used numeric >> taxon labels before without a problem, but PyCogent wasn't involved. >> >> It might work if you can coax PyCogent into writing the Newick files with >> an extra colon: >> ((((:41:0.01494,:44:0.00014):0.604:0... >> >> >> >>> Problem 2: I would like to store multiple confidence values per node, >>> but I >>> can't figure out how to do it. >>> >>> I can get the plain old 'confidence' attribute set by: >>> >>> clade.confidence = .05 >>> >>> but can't figure out how to add and set new confidence types. Any >>> suggestions? >>> >> >> The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence >> class. >> >> In PhyloXML trees, the attribute "clade.confidence" is actually a Python >> property pointing to the first element of "clade.confidences", a list of >> Confidence objects. It's syntax sugar to keep compatibility with Newick, >> which just has a numeric value there. >> >> You can use it like this: >> >> from Bio.Phylo import PhyloXML >> >> # Create new Confidence instances >> a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap") >> # The second argument is optional >> a_posterior_probability = PhyloXML.Confidence(0.99) >> >> # Select a clade from your tree to modify >> a_clade = mytree.clade[...] >> >> # Modify the list of Confidences directly >> a_clade.confidences.append(a_bootstrap_value) >> a_clade.confidences.append(a_posterior_probability) >> >> >> If you've assigned multiple confidence values to a clade, using the >> PhyloXML class, then the "clade.confidence" shortcut won't work anymore >> because it's not clear which confidence you mean. So you'll have to use >> e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in >> PhyloXML format to preserve the extra data. >> >> Hope that helps. >> >> Best regards, >> Eric >> > > > > -- > "If you hold a cat by the tail you learn things you cannot learn any other > way." > --Mark Twain > > -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From eric.talevich at gmail.com Tue Dec 13 17:28:11 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 13 Dec 2011 17:28:11 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? In-Reply-To: References: Message-ID: On Tue, Dec 13, 2011 at 1:55 PM, Jon Sanders wrote: > Update: yup, seems to be a problem with numeric tip names. > > 1) getting rid of internal edge name doesn't help > 2) appending 'a' to tip names fixes it > 3) this tree: (((1,2),(3,4)),5); loads the numeric tip names as branch > lengths > 4) this tree: (((1:0.01,2:0.01):0.01,(3:0.01,4:0.01):0.01):0.01,5:0.03); > loads the numeric tip names as confidence values and branch lenghts > correctly > > I might try poking around the parser too, although my python foo has > little bar. > > -j > All right, sounds like you've got it working then -- thanks for sharing your investigations. Looking back at my own work, I see that when I used numbered taxon labels earlier, I prefixed them with the letter "t"; I never actually used plain integers, so I didn't hit this issue. For now, I suppose your best bet is to temporarily add a letter prefix or suffix to the names when transferring between PyCogent and Biopython. According to the spec for the Nexus format ( http://www.ncbi.nlm.nih.gov/pubmed/11975335), which includes Newick, all-numeric taxon names are illegal. So, I suppose the Biopython parser's behavior is technically correct -- at least for parsing the tree section of Nexus files. In any case, this behavior is surprising and at least deserves a mention in a docstring or an error message. I'll try to take a look at PyCogent's parser to see how they handle ambiguous cases like the ones you listed. -E On Fri, Dec 9, 2011 at 6:26 PM, Eric Talevich wrote: > >> Hi Jon, >> >> On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders wrote: >> >>> So I have two problems. >>> >>> >>> Problem 1: when importing my newick-formatted trees, which were generated >>> in PyCogent, the terminal labels and branch labels are read in as >>> confidence values because they're numerical. So >>> >>> ((((41:0.01494,44:0.00014)0.604:0... >>> >>> is read in with blank name='' values and 41, 44, 0.605, etc. as >>> 'confidence' values. >>> >> >> Hmm, I'll take a look at the Newick parser. I think I've used numeric >> taxon labels before without a problem, but PyCogent wasn't involved. >> >> It might work if you can coax PyCogent into writing the Newick files with >> an extra colon: >> ((((:41:0.01494,:44:0.00014):0.604:0... >> >> >> >>> Problem 2: I would like to store multiple confidence values per node, >>> but I >>> can't figure out how to do it. >>> >>> I can get the plain old 'confidence' attribute set by: >>> >>> clade.confidence = .05 >>> >>> but can't figure out how to add and set new confidence types. Any >>> suggestions? >>> >> >> The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence >> class. >> >> In PhyloXML trees, the attribute "clade.confidence" is actually a Python >> property pointing to the first element of "clade.confidences", a list of >> Confidence objects. It's syntax sugar to keep compatibility with Newick, >> which just has a numeric value there. >> >> You can use it like this: >> >> from Bio.Phylo import PhyloXML >> >> # Create new Confidence instances >> a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap") >> # The second argument is optional >> a_posterior_probability = PhyloXML.Confidence(0.99) >> >> # Select a clade from your tree to modify >> a_clade = mytree.clade[...] >> >> # Modify the list of Confidences directly >> a_clade.confidences.append(a_bootstrap_value) >> a_clade.confidences.append(a_posterior_probability) >> >> >> If you've assigned multiple confidence values to a clade, using the >> PhyloXML class, then the "clade.confidence" shortcut won't work anymore >> because it's not clear which confidence you mean. So you'll have to use >> e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in >> PhyloXML format to preserve the extra data. >> >> Hope that helps. >> >> Best regards, >> Eric >> > > From jsanders at oeb.harvard.edu Fri Dec 16 13:44:07 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Fri, 16 Dec 2011 13:44:07 -0500 Subject: [Biopython] Bug in Phylo.write('phyloxml')? Message-ID: My XML trees exported in biopython (with confidence values, thanks Eric!) don't open in most XML tree viewing programs. The problem seems to be a spurious tag at the beginning of the tree. If I delete this tag they open fine. -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From eric.talevich at gmail.com Sun Dec 18 00:03:51 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 18 Dec 2011 00:03:51 -0500 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: > My XML trees exported in biopython (with confidence values, thanks Eric!) > don't open in most XML tree viewing programs. > > The problem seems to be a spurious tag at the beginning of the > tree. > > > > > > > If I delete this tag they open fine. > > Hi Jon, Thanks for reporting this. I'll check the spec to see if an empty 'name' tag is even valid. Can you give me the name of one or two programs that are supposed to handle phyloXML, but don't like this input? Is Archaeopteryx one of them? -Eric From jsanders at oeb.harvard.edu Sun Dec 18 00:08:14 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Sun, 18 Dec 2011 00:08:14 -0500 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: Archaeopteryx is the one program that read the trees fine. I also tried HyperTree, TreeGraph2, and Treevolution, which failed. -j On Sun, Dec 18, 2011 at 12:03 AM, Eric Talevich wrote: > On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: > >> My XML trees exported in biopython (with confidence values, thanks Eric!) >> don't open in most XML tree viewing programs. >> >> The problem seems to be a spurious tag at the beginning of the >> tree. >> >> >> >> >> >> >> If I delete this tag they open fine. >> >> > Hi Jon, > > Thanks for reporting this. I'll check the spec to see if an empty 'name' > tag is even valid. Can you give me the name of one or two programs that are > supposed to handle phyloXML, but don't like this input? Is Archaeopteryx > one of them? > > -Eric > -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From eric.talevich at gmail.com Mon Dec 19 00:31:48 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 18 Dec 2011 21:31:48 -0800 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: Alrighty, I think the main problem in your case is that the Newick parser creates trees with the 'name' attribute set to the empty string "" instead of None. When converting to PhyloXML, that value stays in place and gets serialized as an empty element. The author of the phyloXML spec is the author of Archaeopteryx, so all of that makes sense. For your case, Jon, and others having this problem: before writing a tree as phyloXML, set the tree name to None if it's not already named. if tree.name == "": tree.name = None Phylo.write(tree, 'example.xml', 'newick') For the future, I guess the best approach is to change the Newick parser to set the tree name to None instead of "" by default. Any issues with that solution? -Eric On Sat, Dec 17, 2011 at 9:08 PM, Jon Sanders wrote: > Archaeopteryx is the one program that read the trees fine. I also tried > HyperTree, TreeGraph2, and Treevolution, which failed. > > -j > > > On Sun, Dec 18, 2011 at 12:03 AM, Eric Talevich wrote: > >> On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: >> >>> My XML trees exported in biopython (with confidence values, thanks Eric!) >>> don't open in most XML tree viewing programs. >>> >>> The problem seems to be a spurious tag at the beginning of the >>> tree. >>> >>> >>> >>> >>> >>> >>> If I delete this tag they open fine. >>> >>> >> Hi Jon, >> >> Thanks for reporting this. I'll check the spec to see if an empty 'name' >> tag is even valid. Can you give me the name of one or two programs that are >> supposed to handle phyloXML, but don't like this input? Is Archaeopteryx >> one of them? >> >> -Eric >> > > > > -- > "If you hold a cat by the tail you learn things you cannot learn any other > way." > --Mark Twain > > From jsanders at oeb.harvard.edu Mon Dec 19 10:44:51 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Mon, 19 Dec 2011 10:44:51 -0500 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: Yup, that seems to work just fine, and it's way easier and safer than writing a script to open the tree file and kill the tag, which is what I was going to do. Thanks again! -j On Mon, Dec 19, 2011 at 12:31 AM, Eric Talevich wrote: > Alrighty, > > I think the main problem in your case is that the Newick parser creates > trees with the 'name' attribute set to the empty string "" instead of None. > When converting to PhyloXML, that value stays in place and gets serialized > as an empty element. The author of the phyloXML spec is the author of > Archaeopteryx, so all of that makes sense. > > For your case, Jon, and others having this problem: before writing a tree > as phyloXML, set the tree name to None if it's not already named. > > if tree.name == "": > tree.name = None > Phylo.write(tree, 'example.xml', 'newick') > > > For the future, I guess the best approach is to change the Newick parser > to set the tree name to None instead of "" by default. Any issues with that > solution? > > -Eric > > > > On Sat, Dec 17, 2011 at 9:08 PM, Jon Sanders wrote: > >> Archaeopteryx is the one program that read the trees fine. I also tried >> HyperTree, TreeGraph2, and Treevolution, which failed. >> >> -j >> >> >> On Sun, Dec 18, 2011 at 12:03 AM, Eric Talevich wrote: >> >>> On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: >>> >>>> My XML trees exported in biopython (with confidence values, thanks >>>> Eric!) >>>> don't open in most XML tree viewing programs. >>>> >>>> The problem seems to be a spurious tag at the beginning of the >>>> tree. >>>> >>>> >>>> >>>> >>>> >>>> >>>> If I delete this tag they open fine. >>>> >>>> >>> Hi Jon, >>> >>> Thanks for reporting this. I'll check the spec to see if an empty 'name' >>> tag is even valid. Can you give me the name of one or two programs that are >>> supposed to handle phyloXML, but don't like this input? Is Archaeopteryx >>> one of them? >>> >>> -Eric >>> >> >> >> >> -- >> "If you hold a cat by the tail you learn things you cannot learn any >> other way." >> --Mark Twain >> >> > -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From benjamin.ys.li at gmail.com Tue Dec 20 03:25:00 2011 From: benjamin.ys.li at gmail.com (Benjamin Li) Date: Tue, 20 Dec 2011 16:25:00 +0800 Subject: [Biopython] To do Blast alignment of two proteins by biopython Message-ID: Hi all, I have posted the same question on BioStar, but I think some of the Biopython users may not check Biostar so I also post it to the mail list. I am new to biopython and I am not sure if it is a stupid question. I would like to perform the same task as the one can be done on the web based blast. I am trying to align two protein by providing their GIs and receive the e-value. But I can only find examples to blast one protein in the biopython tutorial. How can I align two protein using the qblast function? I have tried the command result_handle = NCBIWWW.qblast("blastp", "pat", "330186","118881") But it returns me a lot of alignments and a lot of e-values. I also found that it takes a few minutes for my computer to complete the query, where if I use the web based blast only takes a few seconds. Is my query anything wrong? Merry Christmas! Benjamin Li From p.j.a.cock at googlemail.com Tue Dec 20 05:10:06 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 20 Dec 2011 10:10:06 +0000 Subject: [Biopython] To do Blast alignment of two proteins by biopython In-Reply-To: References: Message-ID: On Tue, Dec 20, 2011 at 8:25 AM, Benjamin Li wrote: > Hi all, > > I have posted the same question on BioStar, but I think some of the > Biopython users may not check Biostar so I also post it to the mail list. > > I am new to biopython and I am not sure if it is a stupid question. > > I would like to perform the same task as the one can be done on the web > based blast. > > I am trying to align two protein by providing their GIs and receive the > e-value. But I can only find examples to blast one protein in the biopython > tutorial. > > How can I align two protein using the qblast function? I don't think you can. The main BLAST webpage is actually very advanced, doing things like trying to guide you with parameter choices and so on. The QBLAST URL API is much simpler. http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html > I have tried the command > > result_handle = NCBIWWW.qblast("blastp", "pat", "330186","118881") > > But it returns me a lot of alignments and a lot of e-values. > > I also found that it takes a few minutes for my computer to complete the > query, where if I use the web based blast only takes a few seconds. > > Is my query anything wrong? Probably not. The Biopython qblast function uses a 3 second delay between checking for results, which may partly explain this. Also the NCBI may give priority to the website. Have you considered downloading the sequences and running standalone BLAST on your computer? Peter From member at linkedin.com Sun Dec 25 16:36:34 2011 From: member at linkedin.com (Justinas Daugmaudis via LinkedIn) Date: Sun, 25 Dec 2011 21:36:34 +0000 (UTC) Subject: [Biopython] Invitation to connect on LinkedIn Message-ID: <2000945239.20014961.1324848994821.JavaMail.app@ela4-app0130.prod> LinkedIn ------------ Justinas Daugmaudis requested to add you as a connection on LinkedIn: ------------------------------------------ Michele, I'd like to add you to my professional network on LinkedIn. - Justinas Accept invitation from Justinas Daugmaudis http://www.linkedin.com/e/l8bh8w-gwmkvuv5-4/qqAvDr0lR7bVZ5oUF-GdFl1c_dfVGAwasCwqz9Wv-gP/blk/I66984156_16/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_dz5vdzkNd3wVdzp9bQZKiCFvdCRDbPgTcPATc3cOcPkLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=3Hoj7CCzUMXl01 View invitation from Justinas Daugmaudis http://www.linkedin.com/e/l8bh8w-gwmkvuv5-4/qqAvDr0lR7bVZ5oUF-GdFl1c_dfVGAwasCwqz9Wv-gP/blk/I66984156_16/0SclYSdj4Qe3ASdAALqnpPbOYWrSlI/svi/?hs=false&tok=3kaJ0Zs-UMXl01 ------------------------------------------ DID YOU KNOW you can use your LinkedIn profile as your website? Select a vanity URL and then promote this address on your business cards, email signatures, website, etc http://www.linkedin.com/e/l8bh8w-gwmkvuv5-4/ewp/inv-21/?hs=false&tok=2mIYA-HoQMXl01 -- (c) 2011, LinkedIn Corporation From mmokrejs at fold.natur.cuni.cz Mon Dec 26 12:55:09 2011 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Mon, 26 Dec 2011 18:55:09 +0100 Subject: [Biopython] GO parsers in biopython Message-ID: <4EF8B4FD.6050601@fold.natur.cuni.cz> Hi, I would like to parse some OBO/OWL files in python. I searched for some existing code and found http://biopython.org/wiki/Gene_Ontology pointing to some OWL parser from Ed Cannon (follow a link on the page listed above). Unfortunately, the code is gone? :(( I also discovered an OBO parser at http://hal.elte.hu/~nepusz/development, the sources can be fetched from http://bazaar.launchpad.net/~ntamas/+junk/go-parser/tarball/7?start_revid=7 It can open the .obo files for me although I do not see much methods available. Finally, I found https://github.com/gotgenes/biopython/tree/a4824ceb71f3a687b3eb5e1fefd0ad3c278bf185/Bio/GO so my question is when will this be available in released biopython and what are your opinions/suggestions now. Does it offer more than the go-parser from ~ntamas? I want to cluster some sequences based on anatomical terms, so I think what I want is to be able to lookup easily all parents (probably except the very root node or so) and compare whether they overlap with any parent of another sequence. Thank you for your comments, Martin P.S.: I want to parse OBO from http://www.evocontology.org/ From eric.talevich at gmail.com Mon Dec 26 15:34:28 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 26 Dec 2011 12:34:28 -0800 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: Here's the fix: https://github.com/biopython/biopython/commit/85c3f5d35a0ae349cc35ea94754037f5c329b9b3 The 'name' attribute for clades already defaulted to None instead of the empty string (""), so I think it was just an oversight that the Tree 'name' attribute defaulted to "" before this patch. That should fix your problem, Jon. I'm not so concerned about the empty tag crashing other tree viewers because it's valid XML, it fits the phyloXML spec, and Archaeopteryx accepts it. This patch should keep that troublesome attribute value from arising inadvertently, though. Cheers & best wishes, Eric On Mon, Dec 19, 2011 at 7:44 AM, Jon Sanders wrote: > Yup, that seems to work just fine, and it's way easier and safer than > writing a script to open the tree file and kill the tag, which is what I > was going to do. Thanks again! > > -j > > > On Mon, Dec 19, 2011 at 12:31 AM, Eric Talevich wrote: > >> Alrighty, >> >> I think the main problem in your case is that the Newick parser creates >> trees with the 'name' attribute set to the empty string "" instead of None. >> When converting to PhyloXML, that value stays in place and gets serialized >> as an empty element. The author of the phyloXML spec is the author of >> Archaeopteryx, so all of that makes sense. >> >> For your case, Jon, and others having this problem: before writing a tree >> as phyloXML, set the tree name to None if it's not already named. >> >> if tree.name == "": >> tree.name = None >> Phylo.write(tree, 'example.xml', 'newick') >> >> >> For the future, I guess the best approach is to change the Newick parser >> to set the tree name to None instead of "" by default. Any issues with that >> solution? >> >> -Eric >> >> >> >> On Sat, Dec 17, 2011 at 9:08 PM, Jon Sanders wrote: >> >>> Archaeopteryx is the one program that read the trees fine. I also tried >>> HyperTree, TreeGraph2, and Treevolution, which failed. >>> >>> -j >>> >>> >>> On Sun, Dec 18, 2011 at 12:03 AM, Eric Talevich >> > wrote: >>> >>>> On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: >>>> >>>>> My XML trees exported in biopython (with confidence values, thanks >>>>> Eric!) >>>>> don't open in most XML tree viewing programs. >>>>> >>>>> The problem seems to be a spurious tag at the beginning of the >>>>> tree. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> If I delete this tag they open fine. >>>>> >>>>> >>>> Hi Jon, >>>> >>>> Thanks for reporting this. I'll check the spec to see if an empty >>>> 'name' tag is even valid. Can you give me the name of one or two programs >>>> that are supposed to handle phyloXML, but don't like this input? Is >>>> Archaeopteryx one of them? >>>> >>>> -Eric >>>> >>> >>> >>> >>> -- >>> "If you hold a cat by the tail you learn things you cannot learn any >>> other way." >>> --Mark Twain >>> >>> >> > > > -- > "If you hold a cat by the tail you learn things you cannot learn any other > way." > --Mark Twain > > From p.j.a.cock at googlemail.com Thu Dec 1 08:52:43 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Dec 2011 08:52:43 +0000 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: On Wednesday, November 30, 2011, Mic wrote: > Thank you it is working. > Excellent - sorry I couldn't think of a nice way to explain the syntax. Peter From jmtc21 at bath.ac.uk Thu Dec 1 14:09:33 2011 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Thu, 01 Dec 2011 14:09:33 +0000 Subject: [Biopython] GTF to GFF using BCBio In-Reply-To: References: Message-ID: <4ED78A9D.4080306@bath.ac.uk> Hello all, I was trying to find an example on how to get a gff3 formated file from a gtf file using the BCBio library extensions for biopython. I the doc it only covers non-gff-family formats to gff3. I guess is just a parameter thing in the GFF writer but have been looking unsuccessfully for the answer. Any hint will be greatly appreciated. Best regards, J. From jmtc21 at bath.ac.uk Thu Dec 1 16:45:40 2011 From: jmtc21 at bath.ac.uk (Jaime Tovar) Date: Thu, 01 Dec 2011 16:45:40 +0000 Subject: [Biopython] GTF to GFF using BCBio In-Reply-To: <4ED78A9D.4080306@bath.ac.uk> References: <4ED78A9D.4080306@bath.ac.uk> Message-ID: <4ED7AF34.8050707@bath.ac.uk> Reply from BCBio library's author: https://github.com/chapmanb/bcbb/blob/master/gff/Scripts/gff/gff2_to_gff3.py and will cover GFF2/GTF files into GFF3. J. On 01/12/11 14:09, Jaime Tovar wrote: > Hello all, > > I was trying to find an example on how to get a gff3 formated file > from a gtf file using the BCBio library extensions for biopython. I > the doc it only covers non-gff-family formats to gff3. I guess is just > a parameter thing in the GFF writer but have been looking > unsuccessfully for the answer. Any hint will be greatly appreciated. > > Best regards, > > J. > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From daniel.svozil at vscht.cz Tue Dec 6 11:39:35 2011 From: daniel.svozil at vscht.cz (Daniel Svozil) Date: Tue, 6 Dec 2011 12:39:35 +0100 Subject: [Biopython] mmView - a tool for mmCIF exploration In-Reply-To: References: Message-ID: Dear colleagues, We would like to announce the availability of mmView - the web-based application which allows to comfortably explore the structural data of biomacromolecules stored in the mmCIF (macromolecular Crystallographic Information File) format. The mmView software system is primarily intended for educational purposes but it can also serve as an auxiliary tool for working with biomolecular structures. The mmView application is offered in two flavors: as a publicly available web server http://ich.vscht.cz/projects/mmview/, and as an open-source stand-alone application (available from http://sourceforge.net/projects/mmview) that can be installed on the user?s computer. Petr Cech and Daniel Svozil -- Daniel Svozil, PhD Head of Laboratory of Informatics and Chemistry Institute of Chemical Technology Czech Republic phone: +420 220 444 391 http://ich.vscht.cz/~svozil From mictadlo at gmail.com Wed Dec 7 04:41:25 2011 From: mictadlo at gmail.com (Mic) Date: Wed, 7 Dec 2011 14:41:25 +1000 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: No worries is was perfect. I have the following code and I do not know how to combine the *header* and *seq* variables from the '*with*' statement with generator expression? from Bio import SeqIO from Bio.SeqRecord import SeqRecord from Bio.Seq import Seq from pprint import pprint if __name__ == '__main__': *with* open('input.txt') as f: for line in f: try: splited_line = line.split('\t') *header* = splited_line[0] +'_'+ splited_line[2] *seq* = splited_line[3] except IndexError: continue fasta_file = open('output.fasta', 'w') records = (SeqRecord(???), id=????, description="") for i in ???) SeqIO.write(records, fasta_file, "fasta") Thank you in advance. On Thu, Dec 1, 2011 at 6:52 PM, Peter Cock wrote: > > > On Wednesday, November 30, 2011, Mic wrote: > > Thank you it is working. > > > > Excellent - sorry I couldn't think of a nice way to explain the syntax. > > Peter From p.j.a.cock at googlemail.com Wed Dec 7 08:26:08 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 7 Dec 2011 08:26:08 +0000 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: On Wed, Dec 7, 2011 at 4:41 AM, Mic wrote: > No worries is was perfect. > > I have the following code and I do not know how to combine the *header* and > *seq* variables from the '*with*' statement with generator expression? > > from Bio import SeqIO > from Bio.SeqRecord import SeqRecord > from Bio.Seq import Seq > from pprint import pprint > > if __name__ == '__main__': > > ? ?*with* open('input.txt') as f: > ? ? ? ?for line in f: > ? ? ? ? ? ?try: > ? ? ? ? ? ? ? ?splited_line = line.split('\t') > > ? ? ? ? ? ? ? ?*header* = splited_line[0] +'_'+ splited_line[2] > ? ? ? ? ? ? ? ?*seq* = splited_line[3] > ? ? ? ? ? ?except IndexError: > ? ? ? ? ? ? ? ?continue > > ? ?fasta_file = open('output.fasta', 'w') > ? ?records = (SeqRecord(???), id=????, description="") for i in ???) > > ? ?SeqIO.write(records, fasta_file, "fasta") > > Thank you in advance. Are you trying to parse a tabular file, with three columns (ID, sequence, description)? I suggest you learn about generator functions in Python. Peter From jordan.r.willis at Vanderbilt.Edu Wed Dec 7 08:49:06 2011 From: jordan.r.willis at Vanderbilt.Edu (Willis, Jordan R) Date: Wed, 7 Dec 2011 02:49:06 -0600 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: <1923E9B1-AA7B-49FA-8F63-F05642E948D7@vanderbilt.edu> Does the input.txt have fasta sequences in it and you want to write them to a file? This is what it looks like. from Bio import SeqIO as sio generator = sio.read('input.txt','fasta') for i in generator: print i.header print i.seq will give you header and sequence. You could of course write this to a file, but you seem to be inputting a fasta file just to write out another one. Jordan Willis Ph.D Candidate, CPB Laboratory of Dr. James Crowe and Dr. Jens Meiler 11475 MRBIV 2213 Garland Ave. Nashville, TN 37232 Cell: 816-674-5340 Office: 615-343-8263 On Dec 7, 2011, at 12:26 AM, Peter Cock wrote: On Wed, Dec 7, 2011 at 4:41 AM, Mic > wrote: No worries is was perfect. I have the following code and I do not know how to combine the *header* and *seq* variables from the '*with*' statement with generator expression? from Bio import SeqIO from Bio.SeqRecord import SeqRecord from Bio.Seq import Seq from pprint import pprint if __name__ == '__main__': *with* open('input.txt') as f: for line in f: try: splited_line = line.split('\t') *header* = splited_line[0] +'_'+ splited_line[2] *seq* = splited_line[3] except IndexError: continue fasta_file = open('output.fasta', 'w') records = (SeqRecord(???), id=????, description="") for i in ???) SeqIO.write(records, fasta_file, "fasta") Thank you in advance. Are you trying to parse a tabular file, with three columns (ID, sequence, description)? I suggest you learn about generator functions in Python. Peter _______________________________________________ Biopython mailing list - Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython From mictadlo at gmail.com Wed Dec 7 13:11:24 2011 From: mictadlo at gmail.com (Mic) Date: Wed, 7 Dec 2011 23:11:24 +1000 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: <1923E9B1-AA7B-49FA-8F63-F05642E948D7@vanderbilt.edu> References: <1923E9B1-AA7B-49FA-8F63-F05642E948D7@vanderbilt.edu> Message-ID: Thank you for the solution. My input files is not exactly a FASTA file, but it contains information to build it. The file looks like this: test1\t0001\a1\tAATTCC Output should looks like: >test1_a1 AATTCC On Wed, Dec 7, 2011 at 6:49 PM, Willis, Jordan R < jordan.r.willis at vanderbilt.edu> wrote: > Does the input.txt have fasta sequences in it and you want to write them > to a file? This is what it looks like. > > from Bio import SeqIO as sio > > generator = sio.read('input.txt','fasta') > > > for i in generator: > print i.header > print i.seq > > will give you header and sequence. You could of course write this to a > file, but you seem to be inputting a fasta file just to write out another > one. > > > > > Jordan Willis > Ph.D Candidate, CPB > Laboratory of Dr. James Crowe and Dr. Jens Meiler > 11475 MRBIV > 2213 Garland Ave. > Nashville, TN 37232 > Cell: 816-674-5340 > Office: 615-343-8263 > > > On Dec 7, 2011, at 12:26 AM, Peter Cock wrote: > > On Wed, Dec 7, 2011 at 4:41 AM, Mic wrote: > > No worries is was perfect. > > > I have the following code and I do not know how to combine the *header* and > > *seq* variables from the '*with*' statement with generator expression? > > > from Bio import SeqIO > > from Bio.SeqRecord import SeqRecord > > from Bio.Seq import Seq > > from pprint import pprint > > > if __name__ == '__main__': > > > *with* open('input.txt') as f: > > for line in f: > > try: > > splited_line = line.split('\t') > > > *header* = splited_line[0] +'_'+ splited_line[2] > > *seq* = splited_line[3] > > except IndexError: > > continue > > > fasta_file = open('output.fasta', 'w') > > records = (SeqRecord(???), id=????, description="") for i in ???) > > > SeqIO.write(records, fasta_file, "fasta") > > > Thank you in advance. > > > Are you trying to parse a tabular file, with three columns > (ID, sequence, description)? > > I suggest you learn about generator functions in Python. > > Peter > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > > > From eric.talevich at gmail.com Wed Dec 7 16:07:57 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Dec 2011 11:07:57 -0500 Subject: [Biopython] Generator expression for SeqIO In-Reply-To: References: Message-ID: Mic, You don't really need a generator expression here, but I recommend that you read the Python Tutorial to learn how to use them anyway. To solve your problem, here's one solution using Biopython and a list comprehension (like a generator expression, but more your pace): def row_to_seqrecord(row): """Convert a tab-delimited row to a SeqRecord. Row looks like: test1\t0001\a1\tAATTCC Record looks like (conceptually): >test1_a1 AATTCC """ cells = [cell.strip() for cell in row.split('\t')] return SeqRecord(Seq(cells[3]), id=cells[0] + '_' + cells[2]) with open('input.txt') as infile: records = [row_to_seqrecord(line) for line in infile] SeqIO.write(records, 'output.txt', 'fasta') But the nice thing about FASTA format is that there's almost no structure to it. Here's a simpler way to do it that doesn't use Biopython: with open('input.txt') as infile: with open('output.fasta', 'w+') as outfile: for line in infile: parts = [part.strip() for part in line.split('\t')] if len(parts) != 4: continue # Header outfile.write(">%s_%s\n" % (parts[0], parts[2]) # Sequence outfile.write(parts[3] + '\n') On Tue, Dec 6, 2011 at 11:41 PM, Mic wrote: > No worries is was perfect. > > I have the following code and I do not know how to combine the *header* and > *seq* variables from the '*with*' statement with generator expression? > > from Bio import SeqIO > from Bio.SeqRecord import SeqRecord > from Bio.Seq import Seq > from pprint import pprint > > if __name__ == '__main__': > > *with* open('input.txt') as f: > for line in f: > try: > splited_line = line.split('\t') > > *header* = splited_line[0] +'_'+ splited_line[2] > *seq* = splited_line[3] > except IndexError: > continue > > fasta_file = open('output.fasta', 'w') > records = (SeqRecord(???), id=????, description="") for i in ???) > > SeqIO.write(records, fasta_file, "fasta") > > Thank you in advance. > > On Thu, Dec 1, 2011 at 6:52 PM, Peter Cock >wrote: > > > > > > > On Wednesday, November 30, 2011, Mic wrote: > > > Thank you it is working. > > > > > > > Excellent - sorry I couldn't think of a nice way to explain the syntax. > > > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From jsanders at oeb.harvard.edu Fri Dec 9 21:53:12 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Fri, 9 Dec 2011 16:53:12 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? Message-ID: So I have two problems. Problem 1: when importing my newick-formatted trees, which were generated in PyCogent, the terminal labels and branch labels are read in as confidence values because they're numerical. So ((((41:0.01494,44:0.00014)0.604:0... is read in with blank name='' values and 41, 44, 0.605, etc. as 'confidence' values. Problem 2: I would like to store multiple confidence values per node, but I can't figure out how to do it. I can get the plain old 'confidence' attribute set by: clade.confidence = .05 but can't figure out how to add and set new confidence types. Any suggestions? Much appreciated, -jon -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From eric.talevich at gmail.com Fri Dec 9 23:26:46 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Dec 2011 18:26:46 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? In-Reply-To: References: Message-ID: Hi Jon, On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders wrote: > So I have two problems. > > > Problem 1: when importing my newick-formatted trees, which were generated > in PyCogent, the terminal labels and branch labels are read in as > confidence values because they're numerical. So > > ((((41:0.01494,44:0.00014)0.604:0... > > is read in with blank name='' values and 41, 44, 0.605, etc. as > 'confidence' values. > Hmm, I'll take a look at the Newick parser. I think I've used numeric taxon labels before without a problem, but PyCogent wasn't involved. It might work if you can coax PyCogent into writing the Newick files with an extra colon: ((((:41:0.01494,:44:0.00014):0.604:0... > Problem 2: I would like to store multiple confidence values per node, but I > can't figure out how to do it. > > I can get the plain old 'confidence' attribute set by: > > clade.confidence = .05 > > but can't figure out how to add and set new confidence types. Any > suggestions? > The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence class. In PhyloXML trees, the attribute "clade.confidence" is actually a Python property pointing to the first element of "clade.confidences", a list of Confidence objects. It's syntax sugar to keep compatibility with Newick, which just has a numeric value there. You can use it like this: from Bio.Phylo import PhyloXML # Create new Confidence instances a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap") # The second argument is optional a_posterior_probability = PhyloXML.Confidence(0.99) # Select a clade from your tree to modify a_clade = mytree.clade[...] # Modify the list of Confidences directly a_clade.confidences.append(a_bootstrap_value) a_clade.confidences.append(a_posterior_probability) If you've assigned multiple confidence values to a clade, using the PhyloXML class, then the "clade.confidence" shortcut won't work anymore because it's not clear which confidence you mean. So you'll have to use e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in PhyloXML format to preserve the extra data. Hope that helps. Best regards, Eric From jsanders at oeb.harvard.edu Tue Dec 13 18:17:09 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Tue, 13 Dec 2011 13:17:09 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? In-Reply-To: References: Message-ID: Thanks Eric! I got the hang of the PhyloXML confidence objects now, so that's straightened out. Still having issues with the tree parsing. I tried throwing in extra colons with a regex, both before and after the tip/edge label, but that didn't change the behavior of the parser, and all the tip/edge labels were still imported as confidence values. Poking around some documentation on the newick format, it seems like the edge labels might be tricking the parser into thinking there are confidence values present, since there's no clear way to distinguish between them. I'll try playing around with supressing the edge labels in PyCogent and see if I can't pass a decent tree to BioPython side for proper PhyloXML output. Ugh. -j On Fri, Dec 9, 2011 at 6:26 PM, Eric Talevich wrote: > Hi Jon, > > On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders wrote: > >> So I have two problems. >> >> >> Problem 1: when importing my newick-formatted trees, which were generated >> in PyCogent, the terminal labels and branch labels are read in as >> confidence values because they're numerical. So >> >> ((((41:0.01494,44:0.00014)0.604:0... >> >> is read in with blank name='' values and 41, 44, 0.605, etc. as >> 'confidence' values. >> > > Hmm, I'll take a look at the Newick parser. I think I've used numeric > taxon labels before without a problem, but PyCogent wasn't involved. > > It might work if you can coax PyCogent into writing the Newick files with > an extra colon: > ((((:41:0.01494,:44:0.00014):0.604:0... > > > >> Problem 2: I would like to store multiple confidence values per node, but >> I >> can't figure out how to do it. >> >> I can get the plain old 'confidence' attribute set by: >> >> clade.confidence = .05 >> >> but can't figure out how to add and set new confidence types. Any >> suggestions? >> > > The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence > class. > > In PhyloXML trees, the attribute "clade.confidence" is actually a Python > property pointing to the first element of "clade.confidences", a list of > Confidence objects. It's syntax sugar to keep compatibility with Newick, > which just has a numeric value there. > > You can use it like this: > > from Bio.Phylo import PhyloXML > > # Create new Confidence instances > a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap") > # The second argument is optional > a_posterior_probability = PhyloXML.Confidence(0.99) > > # Select a clade from your tree to modify > a_clade = mytree.clade[...] > > # Modify the list of Confidences directly > a_clade.confidences.append(a_bootstrap_value) > a_clade.confidences.append(a_posterior_probability) > > > If you've assigned multiple confidence values to a clade, using the > PhyloXML class, then the "clade.confidence" shortcut won't work anymore > because it's not clear which confidence you mean. So you'll have to use > e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in > PhyloXML format to preserve the extra data. > > Hope that helps. > > Best regards, > Eric > -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From jsanders at oeb.harvard.edu Tue Dec 13 18:55:20 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Tue, 13 Dec 2011 13:55:20 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? In-Reply-To: References: Message-ID: Update: yup, seems to be a problem with numeric tip names. 1) getting rid of internal edge name doesn't help 2) appending 'a' to tip names fixes it 3) this tree: (((1,2),(3,4)),5); loads the numeric tip names as branch lengths 4) this tree: (((1:0.01,2:0.01):0.01,(3:0.01,4:0.01):0.01):0.01,5:0.03); loads the numeric tip names as confidence values and branch lenghts correctly I might try poking around the parser too, although my python foo has little bar. -j On Tue, Dec 13, 2011 at 1:17 PM, Jon Sanders wrote: > Thanks Eric! I got the hang of the PhyloXML confidence objects now, so > that's straightened out. > > Still having issues with the tree parsing. I tried throwing in extra > colons with a regex, both before and after the tip/edge label, but that > didn't change the behavior of the parser, and all the tip/edge labels were > still imported as confidence values. Poking around some documentation on > the newick format, it seems like the edge labels might be tricking the > parser into thinking there are confidence values present, since there's no > clear way to distinguish between them. I'll try playing around with > supressing the edge labels in PyCogent and see if I can't pass a decent > tree to BioPython side for proper PhyloXML output. > > Ugh. > > -j > > > On Fri, Dec 9, 2011 at 6:26 PM, Eric Talevich wrote: > >> Hi Jon, >> >> On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders wrote: >> >>> So I have two problems. >>> >>> >>> Problem 1: when importing my newick-formatted trees, which were generated >>> in PyCogent, the terminal labels and branch labels are read in as >>> confidence values because they're numerical. So >>> >>> ((((41:0.01494,44:0.00014)0.604:0... >>> >>> is read in with blank name='' values and 41, 44, 0.605, etc. as >>> 'confidence' values. >>> >> >> Hmm, I'll take a look at the Newick parser. I think I've used numeric >> taxon labels before without a problem, but PyCogent wasn't involved. >> >> It might work if you can coax PyCogent into writing the Newick files with >> an extra colon: >> ((((:41:0.01494,:44:0.00014):0.604:0... >> >> >> >>> Problem 2: I would like to store multiple confidence values per node, >>> but I >>> can't figure out how to do it. >>> >>> I can get the plain old 'confidence' attribute set by: >>> >>> clade.confidence = .05 >>> >>> but can't figure out how to add and set new confidence types. Any >>> suggestions? >>> >> >> The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence >> class. >> >> In PhyloXML trees, the attribute "clade.confidence" is actually a Python >> property pointing to the first element of "clade.confidences", a list of >> Confidence objects. It's syntax sugar to keep compatibility with Newick, >> which just has a numeric value there. >> >> You can use it like this: >> >> from Bio.Phylo import PhyloXML >> >> # Create new Confidence instances >> a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap") >> # The second argument is optional >> a_posterior_probability = PhyloXML.Confidence(0.99) >> >> # Select a clade from your tree to modify >> a_clade = mytree.clade[...] >> >> # Modify the list of Confidences directly >> a_clade.confidences.append(a_bootstrap_value) >> a_clade.confidences.append(a_posterior_probability) >> >> >> If you've assigned multiple confidence values to a clade, using the >> PhyloXML class, then the "clade.confidence" shortcut won't work anymore >> because it's not clear which confidence you mean. So you'll have to use >> e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in >> PhyloXML format to preserve the extra data. >> >> Hope that helps. >> >> Best regards, >> Eric >> > > > > -- > "If you hold a cat by the tail you learn things you cannot learn any other > way." > --Mark Twain > > -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From eric.talevich at gmail.com Tue Dec 13 22:28:11 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 13 Dec 2011 17:28:11 -0500 Subject: [Biopython] help with confidence values on PhyloXML tree objects? In-Reply-To: References: Message-ID: On Tue, Dec 13, 2011 at 1:55 PM, Jon Sanders wrote: > Update: yup, seems to be a problem with numeric tip names. > > 1) getting rid of internal edge name doesn't help > 2) appending 'a' to tip names fixes it > 3) this tree: (((1,2),(3,4)),5); loads the numeric tip names as branch > lengths > 4) this tree: (((1:0.01,2:0.01):0.01,(3:0.01,4:0.01):0.01):0.01,5:0.03); > loads the numeric tip names as confidence values and branch lenghts > correctly > > I might try poking around the parser too, although my python foo has > little bar. > > -j > All right, sounds like you've got it working then -- thanks for sharing your investigations. Looking back at my own work, I see that when I used numbered taxon labels earlier, I prefixed them with the letter "t"; I never actually used plain integers, so I didn't hit this issue. For now, I suppose your best bet is to temporarily add a letter prefix or suffix to the names when transferring between PyCogent and Biopython. According to the spec for the Nexus format ( http://www.ncbi.nlm.nih.gov/pubmed/11975335), which includes Newick, all-numeric taxon names are illegal. So, I suppose the Biopython parser's behavior is technically correct -- at least for parsing the tree section of Nexus files. In any case, this behavior is surprising and at least deserves a mention in a docstring or an error message. I'll try to take a look at PyCogent's parser to see how they handle ambiguous cases like the ones you listed. -E On Fri, Dec 9, 2011 at 6:26 PM, Eric Talevich wrote: > >> Hi Jon, >> >> On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders wrote: >> >>> So I have two problems. >>> >>> >>> Problem 1: when importing my newick-formatted trees, which were generated >>> in PyCogent, the terminal labels and branch labels are read in as >>> confidence values because they're numerical. So >>> >>> ((((41:0.01494,44:0.00014)0.604:0... >>> >>> is read in with blank name='' values and 41, 44, 0.605, etc. as >>> 'confidence' values. >>> >> >> Hmm, I'll take a look at the Newick parser. I think I've used numeric >> taxon labels before without a problem, but PyCogent wasn't involved. >> >> It might work if you can coax PyCogent into writing the Newick files with >> an extra colon: >> ((((:41:0.01494,:44:0.00014):0.604:0... >> >> >> >>> Problem 2: I would like to store multiple confidence values per node, >>> but I >>> can't figure out how to do it. >>> >>> I can get the plain old 'confidence' attribute set by: >>> >>> clade.confidence = .05 >>> >>> but can't figure out how to add and set new confidence types. Any >>> suggestions? >>> >> >> The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence >> class. >> >> In PhyloXML trees, the attribute "clade.confidence" is actually a Python >> property pointing to the first element of "clade.confidences", a list of >> Confidence objects. It's syntax sugar to keep compatibility with Newick, >> which just has a numeric value there. >> >> You can use it like this: >> >> from Bio.Phylo import PhyloXML >> >> # Create new Confidence instances >> a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap") >> # The second argument is optional >> a_posterior_probability = PhyloXML.Confidence(0.99) >> >> # Select a clade from your tree to modify >> a_clade = mytree.clade[...] >> >> # Modify the list of Confidences directly >> a_clade.confidences.append(a_bootstrap_value) >> a_clade.confidences.append(a_posterior_probability) >> >> >> If you've assigned multiple confidence values to a clade, using the >> PhyloXML class, then the "clade.confidence" shortcut won't work anymore >> because it's not clear which confidence you mean. So you'll have to use >> e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in >> PhyloXML format to preserve the extra data. >> >> Hope that helps. >> >> Best regards, >> Eric >> > > From jsanders at oeb.harvard.edu Fri Dec 16 18:44:07 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Fri, 16 Dec 2011 13:44:07 -0500 Subject: [Biopython] Bug in Phylo.write('phyloxml')? Message-ID: My XML trees exported in biopython (with confidence values, thanks Eric!) don't open in most XML tree viewing programs. The problem seems to be a spurious tag at the beginning of the tree. If I delete this tag they open fine. -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From eric.talevich at gmail.com Sun Dec 18 05:03:51 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 18 Dec 2011 00:03:51 -0500 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: > My XML trees exported in biopython (with confidence values, thanks Eric!) > don't open in most XML tree viewing programs. > > The problem seems to be a spurious tag at the beginning of the > tree. > > > > > > > If I delete this tag they open fine. > > Hi Jon, Thanks for reporting this. I'll check the spec to see if an empty 'name' tag is even valid. Can you give me the name of one or two programs that are supposed to handle phyloXML, but don't like this input? Is Archaeopteryx one of them? -Eric From jsanders at oeb.harvard.edu Sun Dec 18 05:08:14 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Sun, 18 Dec 2011 00:08:14 -0500 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: Archaeopteryx is the one program that read the trees fine. I also tried HyperTree, TreeGraph2, and Treevolution, which failed. -j On Sun, Dec 18, 2011 at 12:03 AM, Eric Talevich wrote: > On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: > >> My XML trees exported in biopython (with confidence values, thanks Eric!) >> don't open in most XML tree viewing programs. >> >> The problem seems to be a spurious tag at the beginning of the >> tree. >> >> >> >> >> >> >> If I delete this tag they open fine. >> >> > Hi Jon, > > Thanks for reporting this. I'll check the spec to see if an empty 'name' > tag is even valid. Can you give me the name of one or two programs that are > supposed to handle phyloXML, but don't like this input? Is Archaeopteryx > one of them? > > -Eric > -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From eric.talevich at gmail.com Mon Dec 19 05:31:48 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 18 Dec 2011 21:31:48 -0800 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: Alrighty, I think the main problem in your case is that the Newick parser creates trees with the 'name' attribute set to the empty string "" instead of None. When converting to PhyloXML, that value stays in place and gets serialized as an empty element. The author of the phyloXML spec is the author of Archaeopteryx, so all of that makes sense. For your case, Jon, and others having this problem: before writing a tree as phyloXML, set the tree name to None if it's not already named. if tree.name == "": tree.name = None Phylo.write(tree, 'example.xml', 'newick') For the future, I guess the best approach is to change the Newick parser to set the tree name to None instead of "" by default. Any issues with that solution? -Eric On Sat, Dec 17, 2011 at 9:08 PM, Jon Sanders wrote: > Archaeopteryx is the one program that read the trees fine. I also tried > HyperTree, TreeGraph2, and Treevolution, which failed. > > -j > > > On Sun, Dec 18, 2011 at 12:03 AM, Eric Talevich wrote: > >> On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: >> >>> My XML trees exported in biopython (with confidence values, thanks Eric!) >>> don't open in most XML tree viewing programs. >>> >>> The problem seems to be a spurious tag at the beginning of the >>> tree. >>> >>> >>> >>> >>> >>> >>> If I delete this tag they open fine. >>> >>> >> Hi Jon, >> >> Thanks for reporting this. I'll check the spec to see if an empty 'name' >> tag is even valid. Can you give me the name of one or two programs that are >> supposed to handle phyloXML, but don't like this input? Is Archaeopteryx >> one of them? >> >> -Eric >> > > > > -- > "If you hold a cat by the tail you learn things you cannot learn any other > way." > --Mark Twain > > From jsanders at oeb.harvard.edu Mon Dec 19 15:44:51 2011 From: jsanders at oeb.harvard.edu (Jon Sanders) Date: Mon, 19 Dec 2011 10:44:51 -0500 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: Yup, that seems to work just fine, and it's way easier and safer than writing a script to open the tree file and kill the tag, which is what I was going to do. Thanks again! -j On Mon, Dec 19, 2011 at 12:31 AM, Eric Talevich wrote: > Alrighty, > > I think the main problem in your case is that the Newick parser creates > trees with the 'name' attribute set to the empty string "" instead of None. > When converting to PhyloXML, that value stays in place and gets serialized > as an empty element. The author of the phyloXML spec is the author of > Archaeopteryx, so all of that makes sense. > > For your case, Jon, and others having this problem: before writing a tree > as phyloXML, set the tree name to None if it's not already named. > > if tree.name == "": > tree.name = None > Phylo.write(tree, 'example.xml', 'newick') > > > For the future, I guess the best approach is to change the Newick parser > to set the tree name to None instead of "" by default. Any issues with that > solution? > > -Eric > > > > On Sat, Dec 17, 2011 at 9:08 PM, Jon Sanders wrote: > >> Archaeopteryx is the one program that read the trees fine. I also tried >> HyperTree, TreeGraph2, and Treevolution, which failed. >> >> -j >> >> >> On Sun, Dec 18, 2011 at 12:03 AM, Eric Talevich wrote: >> >>> On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: >>> >>>> My XML trees exported in biopython (with confidence values, thanks >>>> Eric!) >>>> don't open in most XML tree viewing programs. >>>> >>>> The problem seems to be a spurious tag at the beginning of the >>>> tree. >>>> >>>> >>>> >>>> >>>> >>>> >>>> If I delete this tag they open fine. >>>> >>>> >>> Hi Jon, >>> >>> Thanks for reporting this. I'll check the spec to see if an empty 'name' >>> tag is even valid. Can you give me the name of one or two programs that are >>> supposed to handle phyloXML, but don't like this input? Is Archaeopteryx >>> one of them? >>> >>> -Eric >>> >> >> >> >> -- >> "If you hold a cat by the tail you learn things you cannot learn any >> other way." >> --Mark Twain >> >> > -- "If you hold a cat by the tail you learn things you cannot learn any other way." --Mark Twain From benjamin.ys.li at gmail.com Tue Dec 20 08:25:00 2011 From: benjamin.ys.li at gmail.com (Benjamin Li) Date: Tue, 20 Dec 2011 16:25:00 +0800 Subject: [Biopython] To do Blast alignment of two proteins by biopython Message-ID: Hi all, I have posted the same question on BioStar, but I think some of the Biopython users may not check Biostar so I also post it to the mail list. I am new to biopython and I am not sure if it is a stupid question. I would like to perform the same task as the one can be done on the web based blast. I am trying to align two protein by providing their GIs and receive the e-value. But I can only find examples to blast one protein in the biopython tutorial. How can I align two protein using the qblast function? I have tried the command result_handle = NCBIWWW.qblast("blastp", "pat", "330186","118881") But it returns me a lot of alignments and a lot of e-values. I also found that it takes a few minutes for my computer to complete the query, where if I use the web based blast only takes a few seconds. Is my query anything wrong? Merry Christmas! Benjamin Li From p.j.a.cock at googlemail.com Tue Dec 20 10:10:06 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 20 Dec 2011 10:10:06 +0000 Subject: [Biopython] To do Blast alignment of two proteins by biopython In-Reply-To: References: Message-ID: On Tue, Dec 20, 2011 at 8:25 AM, Benjamin Li wrote: > Hi all, > > I have posted the same question on BioStar, but I think some of the > Biopython users may not check Biostar so I also post it to the mail list. > > I am new to biopython and I am not sure if it is a stupid question. > > I would like to perform the same task as the one can be done on the web > based blast. > > I am trying to align two protein by providing their GIs and receive the > e-value. But I can only find examples to blast one protein in the biopython > tutorial. > > How can I align two protein using the qblast function? I don't think you can. The main BLAST webpage is actually very advanced, doing things like trying to guide you with parameter choices and so on. The QBLAST URL API is much simpler. http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html > I have tried the command > > result_handle = NCBIWWW.qblast("blastp", "pat", "330186","118881") > > But it returns me a lot of alignments and a lot of e-values. > > I also found that it takes a few minutes for my computer to complete the > query, where if I use the web based blast only takes a few seconds. > > Is my query anything wrong? Probably not. The Biopython qblast function uses a 3 second delay between checking for results, which may partly explain this. Also the NCBI may give priority to the website. Have you considered downloading the sequences and running standalone BLAST on your computer? Peter From member at linkedin.com Sun Dec 25 21:36:34 2011 From: member at linkedin.com (Justinas Daugmaudis via LinkedIn) Date: Sun, 25 Dec 2011 21:36:34 +0000 (UTC) Subject: [Biopython] Invitation to connect on LinkedIn Message-ID: <2000945239.20014961.1324848994821.JavaMail.app@ela4-app0130.prod> LinkedIn ------------ Justinas Daugmaudis requested to add you as a connection on LinkedIn: ------------------------------------------ Michele, I'd like to add you to my professional network on LinkedIn. - Justinas Accept invitation from Justinas Daugmaudis http://www.linkedin.com/e/l8bh8w-gwmkvuv5-4/qqAvDr0lR7bVZ5oUF-GdFl1c_dfVGAwasCwqz9Wv-gP/blk/I66984156_16/pmpxnSRJrSdvj4R5fnhv9ClRsDgZp6lQs6lzoQ5AomZIpn8_dz5vdzkNd3wVdzp9bQZKiCFvdCRDbPgTcPATc3cOcPkLrCBxbOYWrSlI/EML_comm_afe/?hs=false&tok=3Hoj7CCzUMXl01 View invitation from Justinas Daugmaudis http://www.linkedin.com/e/l8bh8w-gwmkvuv5-4/qqAvDr0lR7bVZ5oUF-GdFl1c_dfVGAwasCwqz9Wv-gP/blk/I66984156_16/0SclYSdj4Qe3ASdAALqnpPbOYWrSlI/svi/?hs=false&tok=3kaJ0Zs-UMXl01 ------------------------------------------ DID YOU KNOW you can use your LinkedIn profile as your website? Select a vanity URL and then promote this address on your business cards, email signatures, website, etc http://www.linkedin.com/e/l8bh8w-gwmkvuv5-4/ewp/inv-21/?hs=false&tok=2mIYA-HoQMXl01 -- (c) 2011, LinkedIn Corporation From mmokrejs at fold.natur.cuni.cz Mon Dec 26 17:55:09 2011 From: mmokrejs at fold.natur.cuni.cz (Martin Mokrejs) Date: Mon, 26 Dec 2011 18:55:09 +0100 Subject: [Biopython] GO parsers in biopython Message-ID: <4EF8B4FD.6050601@fold.natur.cuni.cz> Hi, I would like to parse some OBO/OWL files in python. I searched for some existing code and found http://biopython.org/wiki/Gene_Ontology pointing to some OWL parser from Ed Cannon (follow a link on the page listed above). Unfortunately, the code is gone? :(( I also discovered an OBO parser at http://hal.elte.hu/~nepusz/development, the sources can be fetched from http://bazaar.launchpad.net/~ntamas/+junk/go-parser/tarball/7?start_revid=7 It can open the .obo files for me although I do not see much methods available. Finally, I found https://github.com/gotgenes/biopython/tree/a4824ceb71f3a687b3eb5e1fefd0ad3c278bf185/Bio/GO so my question is when will this be available in released biopython and what are your opinions/suggestions now. Does it offer more than the go-parser from ~ntamas? I want to cluster some sequences based on anatomical terms, so I think what I want is to be able to lookup easily all parents (probably except the very root node or so) and compare whether they overlap with any parent of another sequence. Thank you for your comments, Martin P.S.: I want to parse OBO from http://www.evocontology.org/ From eric.talevich at gmail.com Mon Dec 26 20:34:28 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 26 Dec 2011 12:34:28 -0800 Subject: [Biopython] Bug in Phylo.write('phyloxml')? In-Reply-To: References: Message-ID: Here's the fix: https://github.com/biopython/biopython/commit/85c3f5d35a0ae349cc35ea94754037f5c329b9b3 The 'name' attribute for clades already defaulted to None instead of the empty string (""), so I think it was just an oversight that the Tree 'name' attribute defaulted to "" before this patch. That should fix your problem, Jon. I'm not so concerned about the empty tag crashing other tree viewers because it's valid XML, it fits the phyloXML spec, and Archaeopteryx accepts it. This patch should keep that troublesome attribute value from arising inadvertently, though. Cheers & best wishes, Eric On Mon, Dec 19, 2011 at 7:44 AM, Jon Sanders wrote: > Yup, that seems to work just fine, and it's way easier and safer than > writing a script to open the tree file and kill the tag, which is what I > was going to do. Thanks again! > > -j > > > On Mon, Dec 19, 2011 at 12:31 AM, Eric Talevich wrote: > >> Alrighty, >> >> I think the main problem in your case is that the Newick parser creates >> trees with the 'name' attribute set to the empty string "" instead of None. >> When converting to PhyloXML, that value stays in place and gets serialized >> as an empty element. The author of the phyloXML spec is the author of >> Archaeopteryx, so all of that makes sense. >> >> For your case, Jon, and others having this problem: before writing a tree >> as phyloXML, set the tree name to None if it's not already named. >> >> if tree.name == "": >> tree.name = None >> Phylo.write(tree, 'example.xml', 'newick') >> >> >> For the future, I guess the best approach is to change the Newick parser >> to set the tree name to None instead of "" by default. Any issues with that >> solution? >> >> -Eric >> >> >> >> On Sat, Dec 17, 2011 at 9:08 PM, Jon Sanders wrote: >> >>> Archaeopteryx is the one program that read the trees fine. I also tried >>> HyperTree, TreeGraph2, and Treevolution, which failed. >>> >>> -j >>> >>> >>> On Sun, Dec 18, 2011 at 12:03 AM, Eric Talevich >> > wrote: >>> >>>> On Fri, Dec 16, 2011 at 1:44 PM, Jon Sanders wrote: >>>> >>>>> My XML trees exported in biopython (with confidence values, thanks >>>>> Eric!) >>>>> don't open in most XML tree viewing programs. >>>>> >>>>> The problem seems to be a spurious tag at the beginning of the >>>>> tree. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> If I delete this tag they open fine. >>>>> >>>>> >>>> Hi Jon, >>>> >>>> Thanks for reporting this. I'll check the spec to see if an empty >>>> 'name' tag is even valid. Can you give me the name of one or two programs >>>> that are supposed to handle phyloXML, but don't like this input? Is >>>> Archaeopteryx one of them? >>>> >>>> -Eric >>>> >>> >>> >>> >>> -- >>> "If you hold a cat by the tail you learn things you cannot learn any >>> other way." >>> --Mark Twain >>> >>> >> > > > -- > "If you hold a cat by the tail you learn things you cannot learn any other > way." > --Mark Twain > >