From rozziite at gmail.com Fri Jul 3 11:01:39 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 3 Jul 2009 11:01:39 -0400 Subject: [BioRuby] GSOC: BioRuby PhyloXML: Validating XML Message-ID: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> Hi all, I have chosen to validate the input xml file at the initialization step of the parser using libxml validator against specified schema. It is quick (1 second on my machine for tree of life xml) and it solves a lot of problems. In my parser, I don't have to worry anymore about what if user gives invalid xml file, and don't have to do error checking for that, thus reducing the parsing overhead. I have a question, if the libxml validator finds something wrong with the xml file (and in general), where errors should go? Should exception be raised, printed on stdout, or on error output? Another question is where should phyloxml.xsd schema file go? Is lib/bio/db/phyloxml.xsd fine? (the same place where phyloxml_parser.rb and phyloxml_elements.rb are). Using the validator I understood that in xml elements have to go in specified order. (like name element of phylogeny should go before the clade element of phylogeny). (Correct me if I am wrong). If thats the case, it will allow me to simplify some code. Have a good 4th July weekend! Diana From czmasek at burnham.org Fri Jul 3 13:33:01 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Fri, 3 Jul 2009 10:33:01 -0700 Subject: [BioRuby] GSOC: BioRuby PhyloXML: Validating XML In-Reply-To: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> References: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> Message-ID: <4A4E40CD.9050700@burnham.org> Hi, Diana: > I have a question, if the libxml validator finds something wrong with > the xml file (and in general), where errors should go? Should > exception be raised, printed on stdout, or on error output? I strongly recommend that a exception be raised. It is the responsibility of the parsers clients to deal with the exception. > Another question is where should phyloxml.xsd schema file go? Is > lib/bio/db/phyloxml.xsd fine? (the same place where phyloxml_parser.rb > and phyloxml_elements.rb are). What about not placing it anywhere and just using the one at: http://www.phyloxml.org/1.00/phyloxml.xsd > > Using the validator I understood that in xml elements have to go in > specified order. (like name element of phylogeny should go before the > clade element of phylogeny). (Correct me if I am wrong). If thats the > case, it will allow me to simplify some code. Yes, the order of elements is defined by the xsd. I never understood how the designers of xml/xsd came to the conclusion that this was useful. > > Have a good 4th July weekend! Same to you! Thanks for the continued good work! Christian From rozziite at gmail.com Fri Jul 3 13:40:12 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 3 Jul 2009 13:40:12 -0400 Subject: [BioRuby] GSOC: BioRuby PhyloXML: Validating XML In-Reply-To: <4A4E40CD.9050700@burnham.org> References: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> <4A4E40CD.9050700@burnham.org> Message-ID: <4057d3bf0907031040y2a68996cvc99a88eab4e18d56@mail.gmail.com> On Fri, Jul 3, 2009 at 1:33 PM, Christian M Zmasek wrote: > Hi, Diana: > > > I have a question, if the libxml validator finds something wrong with the >> xml file (and in general), where errors should go? Should exception be >> raised, printed on stdout, or on error output? >> > I strongly recommend that a exception be raised. It is the responsibility > of the parsers clients to deal with the exception. > > > > Another question is where should phyloxml.xsd schema file go? Is >> lib/bio/db/phyloxml.xsd fine? (the same place where phyloxml_parser.rb and >> phyloxml_elements.rb are). >> > What about not placing it anywhere and just using the one at: > http://www.phyloxml.org/1.00/phyloxml.xsd I was considering it, but then that means that parser is dependent on the computer being online and accessing it through internet. If thats fine, then I can do that. > Using the validator I understood that in xml elements have to go in > specified order. (like name element of phylogeny should go before the clade > element of phylogeny). (Correct me if I am wrong). If thats the case, it > will allow me to simplify some code. > Yes, the order of elements is defined by the xsd. I never understood how the > designers of xml/xsd came to the conclusion that this was useful. > I find it really useful. It simplifies parsing. :) > > > >> Have a good 4th July weekend! >> > Same to you! Thanks for the continued good work! > > Christian > > > From tomoakin at kenroku.kanazawa-u.ac.jp Sat Jul 4 09:59:09 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Sat, 4 Jul 2009 22:59:09 +0900 Subject: [BioRuby] SIM4 parser Message-ID: <5510B566-E723-4AEE-8DEC-63BE1ABD9F19@kenroku.kanazawa-u.ac.jp> Hi, I am now trying to parse a lot of SIM4 outputs. First, as I did not like to create a file for each output, I inserted "SIM4\n" as a separator like BLAST, and modified the parser to use DELIMITER. Since the delimiter SIM4 was arbitrarily selected by myself and is not standard the above modification perhaps will not go to the formal bioruby distribution. This change worked fine, but yet I found the parsing of alignment fails often. The problem seems to sit in the individual parser. One of the reason was related to the alignment like: 450 . : . : . : . : . : 447 CTCCCTCAGCGGCCTCTATTTTCAAGGGCTTCCGCATTACAG ||||||||||||||||||||||||||||||||||||||||||<<<...<< 2846 CTCCCTCAGCGGCCTCTATTTTCAAGGGCTTCCGCATTACAGCTG...TA 500 . : . : . : . : . : 489 TCTGGGCAGGAGACGGCATGGAAGGGCGAGCTGGGGATGAAGCAACCAA <||||||||||||||||||||||||||||||||||||||||||||||||| 3081 CTCTGGGCAGGAGACGGCATGGAAGGGCGAGCTGGGGATGAAGCAACCAA This can be corrected with the following modifications: fix the space after the number to one space (\d+\s* -> \d+\s) and remove only the newline character at the end of line (strip -> chomp) @@ -343,8 +343,8 @@ dat.each do |str| a = str.split(/\r?\n/) a.shift - if /^(\s*\d+\s*)(.+)$/ =~ a[0] then - range = ($1.length)..($1.length + $2.strip.length - 1) + if /^(\s*\d+\s)(.+)$/ =~ a[0] then + range = ($1.length)..($1.length + $2.chomp.length - 1) a.collect! { |x| x[range] } s1 << a.shift ml << a.shift so that the space represented at the end and beginning of the line will not be lost. The other one yet to be resolved is related to discontiguous matches that is not considered a proper intron as the following example: 180-534 (6091-6445) 99% == 551-580 (7776-7804) 96% ... 550 533 GA || 6444 GA 0 . : . : . : 551 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAG |||||||||||||||||||||||||||||- 7776 AAAAAAAAAAAAAAAAAAAAAAAAAAAAA I don't find a simple way to modify current code to handle this situation. A way to resolve may to check if the start address match the address that was specified in the previous section stating the ranges of the matches. I'm considering implementing this way. What do you think? -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan From hlapp at gmx.net Sat Jul 4 04:48:05 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 4 Jul 2009 10:48:05 +0200 Subject: [BioRuby] [Wg-phyloinformatics] GSOC: BioRuby PhyloXML: Validating XML In-Reply-To: <4057d3bf0907031040y2a68996cvc99a88eab4e18d56@mail.gmail.com> References: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> <4A4E40CD.9050700@burnham.org> <4057d3bf0907031040y2a68996cvc99a88eab4e18d56@mail.gmail.com> Message-ID: <34998194-96CF-4EAB-B925-9F3DC1DD8A3F@gmx.net> On Jul 3, 2009, at 7:40 PM, Diana Jaunzeikare wrote: > [...] > Another question is where should phyloxml.xsd schema file go? Is lib/ > bio/db/phyloxml.xsd fine? (the same place where phyloxml_parser.rb > and phyloxml_elements.rb are). > What about not placing it anywhere and just using the one at: http://www.phyloxml.org/1.00/phyloxml.xsd > > I was considering it, but then that means that parser is dependent > on the computer being online and accessing it through internet. If > thats fine, then I can do that. I agree, you wouldn't want that as a requirement. (Also, if you download it from there on-the-fly, you'd incur a further overhead, and need to provide ways to specify the necessary parameters for a proxy if the user is behind a firewall.) Aside from that, it may be worth thinking about the question whether you want to reject the entire file with an exception if a single element (tree, or annotation) fails to validate, as opposed to accepting all records that validate and raise an exception on the one that doesn't. The latter is typically how stream parsers of various formats will behave (except that they'll stop and abort the stream upon encountering a record that is invalid), but it may not apply all that well to XML parsing. Just thought I'd raise the question. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From tomoakin at kenroku.kanazawa-u.ac.jp Sun Jul 5 07:28:33 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Sun, 5 Jul 2009 20:28:33 +0900 Subject: [BioRuby] SIM4 parser In-Reply-To: <5510B566-E723-4AEE-8DEC-63BE1ABD9F19@kenroku.kanazawa-u.ac.jp> References: <5510B566-E723-4AEE-8DEC-63BE1ABD9F19@kenroku.kanazawa-u.ac.jp> Message-ID: <0C3F8576-899A-426E-869A-C9DCF8F47868@kenroku.kanazawa-u.ac.jp> Hi, > A way to resolve may to check if the start address match the > address that > was specified in the previous section stating the ranges of the > matches. > I'm considering implementing this way. A working code is obtained and a diff relative to 1.3.0 is attached. The code was changed to parse alignment only after the SegemntPairs are prepared During this work, I also noticed that the semantics of the structure might be misunderstood: 1. The mark after the match, either "->", "<-", "--", or "==" does not represent the direction of the exon, but indicates the presumed direction of the intron following the exon. "--" corresponds in case part of the intervening sequence and midline is shown and "==" is for cases without information for intervening sequence. I do not understand how these patterns are determined by SIM4, but "->" and "<-" can be estimated based on GU-AG rule. Since these directions are essentially assigned to the introns rather than exons, it might be inappropriate to assign these strings to the exon. There is actually rare cases that introns in different direction is deduced: in such case assuming the direction of the exon is same as the 3' intron rather than 5' intron of the exon is not desired. So, it seems arguable to make directions for exon deprecated. From current state of the parser, I bet there are few people using bioruby to parse sim4 alignment output, and changing the interface is acceptable this time. -------------- next part -------------- -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan From chen_li3 at yahoo.com Sun Jul 5 14:39:43 2009 From: chen_li3 at yahoo.com (chen li) Date: Sun, 5 Jul 2009 11:39:43 -0700 (PDT) Subject: [BioRuby] how to set up a local BLAST and run it Message-ID: <336766.99689.qm@web36808.mail.mud.yahoo.com> Hi all, I want to run a local BLAST against an EST database from NCBI. I can't find the tutorial on Bioruby for it. I wonder if anyone out there how to set up within BioRuby. Thank you very much, Li From kwicher at gmail.com Mon Jul 6 12:57:17 2009 From: kwicher at gmail.com (Krzysztof B. Wicher) Date: Mon, 6 Jul 2009 17:57:17 +0100 Subject: [BioRuby] how to set up a local BLAST and run it Message-ID: Hi, I am not sure at which point you are but that what I have done: - download blast executable - download database in fasta format - format database using formatdb When you are done you are ready to run the blast locally. What I do next I simply execute blast program from within Ruby script and parse the output file. If you need more detials. I can send you the example script. Cheers K On Mon, Jul 6, 2009 at 5:00 PM, wrote: > Send BioRuby mailing list submissions to > ? ? ? ?bioruby at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > ? ? ? ?http://lists.open-bio.org/mailman/listinfo/bioruby > or, via email, send a message with subject or body 'help' to > ? ? ? ?bioruby-request at lists.open-bio.org > > You can reach the person managing the list at > ? ? ? ?bioruby-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of BioRuby digest..." > > > Today's Topics: > > ? 1. how to set up a local BLAST and run it (chen li) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 5 Jul 2009 11:39:43 -0700 (PDT) > From: chen li > Subject: [BioRuby] how to set up a local BLAST and run it > To: bioruby at lists.open-bio.org > Message-ID: <336766.99689.qm at web36808.mail.mud.yahoo.com> > Content-Type: text/plain; charset=us-ascii > > > Hi all, > > I want to run a local BLAST against an EST database from NCBI. I can't find the tutorial on Bioruby for it. I wonder if anyone out there how to set up within BioRuby. > > > Thank you very much, > > Li > > > > > > ------------------------------ > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > > > End of BioRuby Digest, Vol 46, Issue 4 > ************************************** > From john.woods at marcottelab.org Tue Jul 7 18:20:27 2009 From: john.woods at marcottelab.org (John O. Woods) Date: Tue, 7 Jul 2009 17:20:27 -0500 Subject: [BioRuby] FlyBase/Chado Message-ID: <91656c3f0907071520t56d13795l56aab2542fe832d3@mail.gmail.com> A few months back I wrote some Perl scripts to extract some data from FlyBase's Chado DB. Then I discovered Ruby. Fast-forward to now, and I'm working on a Rails app that will index certain kinds of data (gene-phenotype linkages, mostly). I would want it to download the data from FlyBase's postgresql database and stick it in my local MySQL db. Is there a BioRuby module written for Chado or perhaps even for FlyBase? If not, where would I start if I wanted to write one? I'm a bit of a ruby-newbie, but I'd like to contribute something if possible. Am I better off just using my Perl-generated flat-files? I did look through the rdoc stuff on the website, but couldn't find anything about Chado. Cheers, John -- The University of Texas at Austin From kwicher at gmail.com Wed Jul 8 14:14:25 2009 From: kwicher at gmail.com (Krzysztof B. Wicher) Date: Wed, 8 Jul 2009 19:14:25 +0100 Subject: [BioRuby] FlyBase/Chado Message-ID: I was looking for something like that as well and I have not found anything. I was to start writing myself module to query postgresql using e.g. Sequel ... unfortunatelly, never had time to do it. Sorry for not being more helpful K > Message: 1 > Date: Tue, 7 Jul 2009 17:20:27 -0500 > From: "John O. Woods" > Subject: [BioRuby] FlyBase/Chado > To: bioruby at lists.open-bio.org > Message-ID: > ? ? ? ?<91656c3f0907071520t56d13795l56aab2542fe832d3 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > A few months back I wrote some Perl scripts to extract some data from > FlyBase's Chado DB. Then I discovered Ruby. Fast-forward to now, and I'm > working on a Rails app that will index certain kinds of data (gene-phenotype > linkages, mostly). I would want it to download the data from FlyBase's > postgresql database and stick it in my local MySQL db. > Is there a BioRuby module written for Chado or perhaps even for FlyBase? > > If not, where would I start if I wanted to write one? I'm a bit of a > ruby-newbie, but I'd like to contribute something if possible. > > Am I better off just using my Perl-generated flat-files? > > I did look through the rdoc stuff on the website, but couldn't find anything > about Chado. > > Cheers, > John > -- > The University of Texas at Austin > > > ------------------------------ > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > > > End of BioRuby Digest, Vol 46, Issue 6 > ************************************** > From rozziite at gmail.com Sat Jul 11 22:18:12 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Sat, 11 Jul 2009 22:18:12 -0400 Subject: [BioRuby] Bio::Tree#children IndexError when called on a root node which does not have children. Message-ID: <4057d3bf0907111918u44d5e5d1v9b5e0ea6241e8f22@mail.gmail.com> Hi all, While i was writing parser for Phyloxml I discovered such behavior in Bio::Tree class. Bio::Tree#children method gives IndexError if it is called on a root node which does not have children. irb(main):002:0> tree = Bio::Tree.new => #, @options={}> irb(main):004:0> node = Bio::Tree::Node.new => (Node:b7c43f80) irb(main):005:0> node.name = "node1" => "node1" irb(main):006:0> tree.root = node => (Node:"node1") irb(main):007:0> tree.children(tree.root) IndexError: node1 not found from /usr/local/lib/site_ruby/1.8/bio/tree.rb:591:in `path' from /usr/local/lib/site_ruby/1.8/bio/tree.rb:640:in `children' from (irb):7 irb(main):008:0> If the children method is called on other than root node (which does not have children), then it correctly gives empty array: irb(main):008:0> node2 = Bio::Tree::Node.new => (Node:b7c3b088) irb(main):009:0> node2.name = "node2" => "node2" irb(main):010:0> tree.add_node(node2) => #{}}, @relations=[], @label={}, @undirected=true, @index={}>, @options={}> irb(main):011:0> tree.add_edge(tree.root, node2) => irb(main):012:0> tree.children(node2) => [] irb(main):013:0> If the root node has children, then everything is fine: irb(main):013:0> tree.children(tree.root) => [(Node:"node2")] Diana From ngoto at gen-info.osaka-u.ac.jp Sun Jul 12 04:46:26 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 12 Jul 2009 17:46:26 +0900 Subject: [BioRuby] Bio::Tree#children IndexError when called on a root node which does not have children. In-Reply-To: <4057d3bf0907111918u44d5e5d1v9b5e0ea6241e8f22@mail.gmail.com> References: <4057d3bf0907111918u44d5e5d1v9b5e0ea6241e8f22@mail.gmail.com> Message-ID: <20090712084627.00DDE1CBC3BC@idnmail.gen-info.osaka-u.ac.jp> Hi Diana, On Sat, 11 Jul 2009 22:18:12 -0400 Diana Jaunzeikare wrote: > Hi all, > > While i was writing parser for Phyloxml I discovered such behavior in > Bio::Tree class. > > Bio::Tree#children method gives IndexError if it is called on a root node > which does not have children. > > irb(main):002:0> tree = Bio::Tree.new > => # @graph={}, @relations=[], @label={}, @undirected=true, @index={}>, > @options={}> > irb(main):004:0> node = Bio::Tree::Node.new > => (Node:b7c43f80) > irb(main):005:0> node.name = "node1" > => "node1" > irb(main):006:0> tree.root = node > => (Node:"node1") > irb(main):007:0> tree.children(tree.root) > IndexError: node1 not found > from /usr/local/lib/site_ruby/1.8/bio/tree.rb:591:in `path' > from /usr/local/lib/site_ruby/1.8/bio/tree.rb:640:in `children' > from (irb):7 > irb(main):008:0> The error shows that "tree.root" does not exist in the tree. Currently, Bio::Tree#root=(node) does not check whether the specified node exists in the tree or not, and it changes only the internal pointer to the root. In addition, it does not modify the tree except the pointer to the root. In this case, the node should be added to the tree. Before "tree.root = node" or "tree.children(tree.root)", tree.add_node(node) is needed. Why the latter case works is that Bio::Tree#add_edge automatically adds nodes if the nodes do not exist in the tree. > > > If the children method is called on other than root node (which does not > have children), then it correctly gives empty array: > > irb(main):008:0> node2 = Bio::Tree::Node.new > => (Node:b7c3b088) > irb(main):009:0> node2.name = "node2" > => "node2" > irb(main):010:0> tree.add_node(node2) > => # @pathway=#{}}, > @relations=[], @label={}, @undirected=true, @index={}>, @options={}> > irb(main):011:0> tree.add_edge(tree.root, node2) > => > irb(main):012:0> tree.children(node2) > => [] > irb(main):013:0> > > If the root node has children, then everything is fine: > > irb(main):013:0> tree.children(tree.root) > => [(Node:"node2")] > > > Diana > -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From rozziite at gmail.com Sun Jul 12 18:11:56 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Sun, 12 Jul 2009 18:11:56 -0400 Subject: [BioRuby] Bio::Tree#children IndexError when called on a root node which does not have children. In-Reply-To: <20090712084627.00DDE1CBC3BC@idnmail.gen-info.osaka-u.ac.jp> References: <4057d3bf0907111918u44d5e5d1v9b5e0ea6241e8f22@mail.gmail.com> <20090712084627.00DDE1CBC3BC@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4057d3bf0907121511k526a573fqf075bd9ca5ad7f25@mail.gmail.com> Thanks for explanation, this makes sense now. Diana On Sun, Jul 12, 2009 at 4:46 AM, Naohisa GOTO wrote: > Hi Diana, > > On Sat, 11 Jul 2009 22:18:12 -0400 > Diana Jaunzeikare wrote: > > > Hi all, > > > > While i was writing parser for Phyloxml I discovered such behavior in > > Bio::Tree class. > > > > Bio::Tree#children method gives IndexError if it is called on a root node > > which does not have children. > > > > irb(main):002:0> tree = Bio::Tree.new > > => # > @graph={}, @relations=[], @label={}, @undirected=true, @index={}>, > > @options={}> > > irb(main):004:0> node = Bio::Tree::Node.new > > => (Node:b7c43f80) > > irb(main):005:0> node.name = "node1" > > => "node1" > > irb(main):006:0> tree.root = node > > => (Node:"node1") > > irb(main):007:0> tree.children(tree.root) > > IndexError: node1 not found > > from /usr/local/lib/site_ruby/1.8/bio/tree.rb:591:in `path' > > from /usr/local/lib/site_ruby/1.8/bio/tree.rb:640:in `children' > > from (irb):7 > > irb(main):008:0> > > The error shows that "tree.root" does not exist in the tree. > > Currently, Bio::Tree#root=(node) does not check whether > the specified node exists in the tree or not, and it changes > only the internal pointer to the root. In addition, it does > not modify the tree except the pointer to the root. > > In this case, the node should be added to the tree. > > Before "tree.root = node" or "tree.children(tree.root)", > tree.add_node(node) > is needed. > > Why the latter case works is that Bio::Tree#add_edge > automatically adds nodes if the nodes do not exist in the tree. > > > > > > > If the children method is called on other than root node (which does not > > have children), then it correctly gives empty array: > > > > irb(main):008:0> node2 = Bio::Tree::Node.new > > => (Node:b7c3b088) > > irb(main):009:0> node2.name = "node2" > > => "node2" > > irb(main):010:0> tree.add_node(node2) > > => # > @pathway=#{}}, > > @relations=[], @label={}, @undirected=true, @index={}>, @options={}> > > irb(main):011:0> tree.add_edge(tree.root, node2) > > => > > irb(main):012:0> tree.children(node2) > > => [] > > irb(main):013:0> > > > > If the root node has children, then everything is fine: > > > > irb(main):013:0> tree.children(tree.root) > > => [(Node:"node2")] > > > > > > Diana > > > > > -- > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > From donttrustben at gmail.com Sun Jul 19 23:52:09 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Mon, 20 Jul 2009 13:52:09 +1000 Subject: [BioRuby] newick gsub(' ','_') Message-ID: Hello. I'm attempting to put spaces in the leaf nodes of a phylogenetic tree, and the bioruby newick writer is replacing them with underscores - not my desired behaviour. I believe the gsub I'm talking about is on line 58 of http://github.com/bioruby/bioruby/blob/97b9284109c9a4431b92eab208509e1df6069b4b/lib/bio/db/newick.rb If a leaf node name has spaces in it, should it then be surrounded with a single quote? Am I not understanding something about the newick format? Thanks in advance, ben -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From tomoakin at kenroku.kanazawa-u.ac.jp Mon Jul 20 01:19:30 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Mon, 20 Jul 2009 14:19:30 +0900 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: References: Message-ID: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> Hi, According to the specification of NEWICK at http://evolution.genetics.washington.edu/phylip/newick_doc.html SPACE in quoted string and underscore are regarded to be identical. In the note it reads "Underscore characters in unquoted labels are converted to blanks. " OTU label in MacClade and PAUP behaves similarly. So, surrounding with single quote or replacing space with underscore are both conforming representation. -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan From ngoto at gen-info.osaka-u.ac.jp Mon Jul 20 03:50:09 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Mon, 20 Jul 2009 16:50:09 +0900 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> Message-ID: <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> Hi, > Hi, > > According to the specification of NEWICK at > > http://evolution.genetics.washington.edu/phylip/newick_doc.html > > SPACE in quoted string and underscore are regarded to be > identical. > > In the note it reads > "Underscore characters in unquoted labels are converted to blanks. " > > OTU label in MacClade and PAUP behaves similarly. > So, surrounding with single quote or replacing space with underscore > are both conforming representation. Newick formatter in BioRuby converts spaces in a label if the label can be treated as "unquoted labels" i.e. it consists of only alphabets, numbers and/or spaces. I believe the behavior is right, although I know some software ignore the underscore rule. When parsing Newick format, giving :parser => :naive option to Bio::Newick.new() can prevent any label character conversion, but no option for the output, because I think genarating broken format is generally a bad thing. Note that the behavior has been changed in BioRuby 1.2.0. Before 1.1.x, it did not care anything about label characters. > -- > Tomoaki NISHIYAMA > > Advanced Science Research Center, > Kanazawa University, > 13-1 Takara-machi, > Kanazawa, 920-0934, Japan > Thank you. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From donttrustben at gmail.com Mon Jul 20 21:19:21 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 21 Jul 2009 11:19:21 +1000 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: Hi. 2009/7/20 Naohisa Goto > > I believe the behavior is right, although I know some software > ignore the underscore rule. When parsing Newick format, giving > :parser => :naive option to Bio::Newick.new() can prevent any > label character conversion, but no option for the output, because > I think genarating broken format is generally a bad thing. OK, thanks - that makes sense. The particular program I'm using, figtree, understands underscores as spaces, and so I never really had any problem in the first place. But to be academic I don't actually agree that the specification says blanks should be converted underscores in otherwise unquoted strings - I think quoting them is equally valid (and possibly supported by more programs), but maybe that's just me. Happy to accept the community judgement on that one. Would it make sense to allow quoting to be forced by the user? I don't see anything in the specification that is against that, so long as everything inside is properly escaped. Thanks, ben From yannick.wurm at unil.ch Mon Jul 20 12:43:07 2009 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Mon, 20 Jul 2009 18:43:07 +0200 Subject: [BioRuby] paper In-Reply-To: References: Message-ID: <93E492C4-F9C4-419D-941F-82376BF7DBAF@unil.ch> Hello, congrats on to Jan Aerts and Andy Law on getting some (much needed) visibility for ruby in BMC Bioinformatics! http://www.biomedcentral.com/1471-2105/10/221 cheers, yannick From ngoto at gen-info.osaka-u.ac.jp Mon Jul 20 23:25:10 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 21 Jul 2009 12:25:10 +0900 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 21 Jul 2009 11:19:21 +1000 Ben Woodcroft wrote: > OK, thanks - that makes sense. The particular program I'm using, figtree, > understands underscores as spaces, and so I never really had any problem in > the first place. > > But to be academic I don't actually agree that the specification says blanks > should be converted underscores in otherwise unquoted strings - I think > quoting them is equally valid (and possibly supported by more programs), but > maybe that's just me. Happy to accept the community judgement on that one. > > Would it make sense to allow quoting to be forced by the user? I don't see > anything in the specification that is against that, so long as everything > inside is properly escaped. Why unquoted labels are preferred is that naive programs that cannot understand quotes might have problems with quotes and speces. Providing unquoted labels as many as possible may reduce such problems, though it isn't perfect. Programs that can understand quoted labels should also be aware of the underscore rule in unquoted labels, and theoretically no problem with unquoted labels. However, if you know any strange programs that can parse quoted labels but can not understand underscores in unquoted labels, please tell us. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From donttrustben at gmail.com Tue Jul 21 02:28:48 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 21 Jul 2009 16:28:48 +1000 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Ok, if I come across something I'll tell you. Thanks, ben 2009/7/21 Naohisa GOTO > Hi, > > On Tue, 21 Jul 2009 11:19:21 +1000 > Ben Woodcroft wrote: > > > OK, thanks - that makes sense. The particular program I'm using, figtree, > > understands underscores as spaces, and so I never really had any problem > in > > the first place. > > > > But to be academic I don't actually agree that the specification says > blanks > > should be converted underscores in otherwise unquoted strings - I think > > quoting them is equally valid (and possibly supported by more programs), > but > > maybe that's just me. Happy to accept the community judgement on that > one. > > > > Would it make sense to allow quoting to be forced by the user? I don't > see > > anything in the specification that is against that, so long as everything > > inside is properly escaped. > > Why unquoted labels are preferred is that naive programs > that cannot understand quotes might have problems with > quotes and speces. Providing unquoted labels as many as > possible may reduce such problems, though it isn't perfect. > > Programs that can understand quoted labels should also be > aware of the underscore rule in unquoted labels, and > theoretically no problem with unquoted labels. However, > if you know any strange programs that can parse quoted labels > but can not understand underscores in unquoted labels, > please tell us. > > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From jan.aerts at gmail.com Tue Jul 21 04:08:43 2009 From: jan.aerts at gmail.com (Jan Aerts) Date: Tue, 21 Jul 2009 09:08:43 +0100 Subject: [BioRuby] paper In-Reply-To: <93E492C4-F9C4-419D-941F-82376BF7DBAF@unil.ch> References: <93E492C4-F9C4-419D-941F-82376BF7DBAF@unil.ch> Message-ID: <4c7507a70907210108q564255a1h20dbefd65adbe40d@mail.gmail.com> Thanks! jan. 2009/7/20 Yannick Wurm > Hello, > > congrats on to Jan Aerts and Andy Law on getting some (much needed) > visibility for ruby in BMC Bioinformatics! > http://www.biomedcentral.com/1471-2105/10/221 > cheers, > yannick > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From czmasek at burnham.org Tue Jul 21 14:34:03 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Tue, 21 Jul 2009 11:34:03 -0700 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4A660A1B.3060108@burnham.org> > Why unquoted labels are preferred is that naive programs > that cannot understand quotes might have problems with > quotes and speces. Providing unquoted labels as many as > possible may reduce such problems, though it isn't perfect. Indeed! Many programs related to phylogentic analysis are quite picky when it comes to names. Having spaces and/or quotes in names is likely to lead to compatibility problems in some (many?) programs and is therefore best avoided. Christian Zmasek http://monochrome-effect.net/ From pmr at ebi.ac.uk Mon Jul 27 04:55:43 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 27 Jul 2009 09:55:43 +0100 Subject: [BioRuby] Open-bio cross-project issues In-Reply-To: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> Message-ID: <4A6D6B8F.9060108@ebi.ac.uk> Peter C. wrote (to bioperl-l, biopython-l, emboss-dev): > Hi all, > > Peter Rice kindly said he will look into an OBF cross project mailing > list, but in the meantime this has been cross posted to the Biopython, > BioPerl, and EMBOSS development lists. There is a list already for this purpose - open-bio-l I think we will also need a cross-project wiki space on the OBF site. Is there something already used by other projects or should we set something up? I am cross-posting this to other OBF project lists to encourage developers interested in combining efforts to address common problems. This started with FASTQ short read formats, and open-bio-l (a low volume list) has also seen discussion of common test data sets. Please sign up to open-bio-l (if you are not there already) and post suggestions for cross-project issues there. The list subscription page is: http://lists.open-bio.org/mailman/listinfo/open-bio-l Please feel free to forward this to any other projects I may have missed (I picked the obvious addresses from the list.open-bio-org server) regards, Peter Rice From rozziite at gmail.com Thu Jul 30 18:04:59 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Thu, 30 Jul 2009 18:04:59 -0400 Subject: [BioRuby] how to convert sequence to fasta format with header information? Message-ID: <4057d3bf0907301504i21d74c8dk6cccf6833476dcd6@mail.gmail.com> Hi all, I want to retrieve sequence from a pdb file and save it in fasta format where* header holds the pdb entry id*. This is how I did it: file = File.new('1OOP.pdb').gets(nil) structure = Bio::PDB.new(file) seq = structure.seqres['A'] puts seq.to_fasta("1OOP", 70) it works and produces result i want: #>1OOP #GPPGEVMGRAIARVADTIGSGPVNSESIPALTAAETGHTSQVVPSDTMQTRHVKNYHSRSESTVENFLCR #SACVFYTTYENHDSDGDNFAYWVINTRQVAQLRRKLEMFTYARFDLELTFVITSTQEQPTVRGQDAPVLT #HQIMYVPPGGPVPTKVNSYSWQTSTNPSVFWTEGSAPPRMSVPFIGIGNAYSMFYDGWARFDKQGTYGIS #TLNNMGTLYMRHVNDGGPGPIVSTVRIYFKPKHVKTWVPRPPRLCQYQKAGNVNFEPTGVTEGRTDITTM #KTT However, according to documation Bio::Sequence::Common#to_fasta is a deprecated method and it suggests to use Bio::Sequence#output, but when I modify code to puts seq.output(:fasta) it gives error that method is not defined. Also I don't see a way how to define the header. What should i use in place of the deprecated to_fasta method? Thanks, Diana From rozziite at gmail.com Fri Jul 3 15:01:39 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 3 Jul 2009 11:01:39 -0400 Subject: [BioRuby] GSOC: BioRuby PhyloXML: Validating XML Message-ID: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> Hi all, I have chosen to validate the input xml file at the initialization step of the parser using libxml validator against specified schema. It is quick (1 second on my machine for tree of life xml) and it solves a lot of problems. In my parser, I don't have to worry anymore about what if user gives invalid xml file, and don't have to do error checking for that, thus reducing the parsing overhead. I have a question, if the libxml validator finds something wrong with the xml file (and in general), where errors should go? Should exception be raised, printed on stdout, or on error output? Another question is where should phyloxml.xsd schema file go? Is lib/bio/db/phyloxml.xsd fine? (the same place where phyloxml_parser.rb and phyloxml_elements.rb are). Using the validator I understood that in xml elements have to go in specified order. (like name element of phylogeny should go before the clade element of phylogeny). (Correct me if I am wrong). If thats the case, it will allow me to simplify some code. Have a good 4th July weekend! Diana From czmasek at burnham.org Fri Jul 3 17:33:01 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Fri, 3 Jul 2009 10:33:01 -0700 Subject: [BioRuby] GSOC: BioRuby PhyloXML: Validating XML In-Reply-To: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> References: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> Message-ID: <4A4E40CD.9050700@burnham.org> Hi, Diana: > I have a question, if the libxml validator finds something wrong with > the xml file (and in general), where errors should go? Should > exception be raised, printed on stdout, or on error output? I strongly recommend that a exception be raised. It is the responsibility of the parsers clients to deal with the exception. > Another question is where should phyloxml.xsd schema file go? Is > lib/bio/db/phyloxml.xsd fine? (the same place where phyloxml_parser.rb > and phyloxml_elements.rb are). What about not placing it anywhere and just using the one at: http://www.phyloxml.org/1.00/phyloxml.xsd > > Using the validator I understood that in xml elements have to go in > specified order. (like name element of phylogeny should go before the > clade element of phylogeny). (Correct me if I am wrong). If thats the > case, it will allow me to simplify some code. Yes, the order of elements is defined by the xsd. I never understood how the designers of xml/xsd came to the conclusion that this was useful. > > Have a good 4th July weekend! Same to you! Thanks for the continued good work! Christian From rozziite at gmail.com Fri Jul 3 17:40:12 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 3 Jul 2009 13:40:12 -0400 Subject: [BioRuby] GSOC: BioRuby PhyloXML: Validating XML In-Reply-To: <4A4E40CD.9050700@burnham.org> References: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> <4A4E40CD.9050700@burnham.org> Message-ID: <4057d3bf0907031040y2a68996cvc99a88eab4e18d56@mail.gmail.com> On Fri, Jul 3, 2009 at 1:33 PM, Christian M Zmasek wrote: > Hi, Diana: > > > I have a question, if the libxml validator finds something wrong with the >> xml file (and in general), where errors should go? Should exception be >> raised, printed on stdout, or on error output? >> > I strongly recommend that a exception be raised. It is the responsibility > of the parsers clients to deal with the exception. > > > > Another question is where should phyloxml.xsd schema file go? Is >> lib/bio/db/phyloxml.xsd fine? (the same place where phyloxml_parser.rb and >> phyloxml_elements.rb are). >> > What about not placing it anywhere and just using the one at: > http://www.phyloxml.org/1.00/phyloxml.xsd I was considering it, but then that means that parser is dependent on the computer being online and accessing it through internet. If thats fine, then I can do that. > Using the validator I understood that in xml elements have to go in > specified order. (like name element of phylogeny should go before the clade > element of phylogeny). (Correct me if I am wrong). If thats the case, it > will allow me to simplify some code. > Yes, the order of elements is defined by the xsd. I never understood how the > designers of xml/xsd came to the conclusion that this was useful. > I find it really useful. It simplifies parsing. :) > > > >> Have a good 4th July weekend! >> > Same to you! Thanks for the continued good work! > > Christian > > > From tomoakin at kenroku.kanazawa-u.ac.jp Sat Jul 4 13:59:09 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Sat, 4 Jul 2009 22:59:09 +0900 Subject: [BioRuby] SIM4 parser Message-ID: <5510B566-E723-4AEE-8DEC-63BE1ABD9F19@kenroku.kanazawa-u.ac.jp> Hi, I am now trying to parse a lot of SIM4 outputs. First, as I did not like to create a file for each output, I inserted "SIM4\n" as a separator like BLAST, and modified the parser to use DELIMITER. Since the delimiter SIM4 was arbitrarily selected by myself and is not standard the above modification perhaps will not go to the formal bioruby distribution. This change worked fine, but yet I found the parsing of alignment fails often. The problem seems to sit in the individual parser. One of the reason was related to the alignment like: 450 . : . : . : . : . : 447 CTCCCTCAGCGGCCTCTATTTTCAAGGGCTTCCGCATTACAG ||||||||||||||||||||||||||||||||||||||||||<<<...<< 2846 CTCCCTCAGCGGCCTCTATTTTCAAGGGCTTCCGCATTACAGCTG...TA 500 . : . : . : . : . : 489 TCTGGGCAGGAGACGGCATGGAAGGGCGAGCTGGGGATGAAGCAACCAA <||||||||||||||||||||||||||||||||||||||||||||||||| 3081 CTCTGGGCAGGAGACGGCATGGAAGGGCGAGCTGGGGATGAAGCAACCAA This can be corrected with the following modifications: fix the space after the number to one space (\d+\s* -> \d+\s) and remove only the newline character at the end of line (strip -> chomp) @@ -343,8 +343,8 @@ dat.each do |str| a = str.split(/\r?\n/) a.shift - if /^(\s*\d+\s*)(.+)$/ =~ a[0] then - range = ($1.length)..($1.length + $2.strip.length - 1) + if /^(\s*\d+\s)(.+)$/ =~ a[0] then + range = ($1.length)..($1.length + $2.chomp.length - 1) a.collect! { |x| x[range] } s1 << a.shift ml << a.shift so that the space represented at the end and beginning of the line will not be lost. The other one yet to be resolved is related to discontiguous matches that is not considered a proper intron as the following example: 180-534 (6091-6445) 99% == 551-580 (7776-7804) 96% ... 550 533 GA || 6444 GA 0 . : . : . : 551 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAG |||||||||||||||||||||||||||||- 7776 AAAAAAAAAAAAAAAAAAAAAAAAAAAAA I don't find a simple way to modify current code to handle this situation. A way to resolve may to check if the start address match the address that was specified in the previous section stating the ranges of the matches. I'm considering implementing this way. What do you think? -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan From hlapp at gmx.net Sat Jul 4 08:48:05 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 4 Jul 2009 10:48:05 +0200 Subject: [BioRuby] [Wg-phyloinformatics] GSOC: BioRuby PhyloXML: Validating XML In-Reply-To: <4057d3bf0907031040y2a68996cvc99a88eab4e18d56@mail.gmail.com> References: <4057d3bf0907030801u67b0f2fdse43cd947eb8271ca@mail.gmail.com> <4A4E40CD.9050700@burnham.org> <4057d3bf0907031040y2a68996cvc99a88eab4e18d56@mail.gmail.com> Message-ID: <34998194-96CF-4EAB-B925-9F3DC1DD8A3F@gmx.net> On Jul 3, 2009, at 7:40 PM, Diana Jaunzeikare wrote: > [...] > Another question is where should phyloxml.xsd schema file go? Is lib/ > bio/db/phyloxml.xsd fine? (the same place where phyloxml_parser.rb > and phyloxml_elements.rb are). > What about not placing it anywhere and just using the one at: http://www.phyloxml.org/1.00/phyloxml.xsd > > I was considering it, but then that means that parser is dependent > on the computer being online and accessing it through internet. If > thats fine, then I can do that. I agree, you wouldn't want that as a requirement. (Also, if you download it from there on-the-fly, you'd incur a further overhead, and need to provide ways to specify the necessary parameters for a proxy if the user is behind a firewall.) Aside from that, it may be worth thinking about the question whether you want to reject the entire file with an exception if a single element (tree, or annotation) fails to validate, as opposed to accepting all records that validate and raise an exception on the one that doesn't. The latter is typically how stream parsers of various formats will behave (except that they'll stop and abort the stream upon encountering a record that is invalid), but it may not apply all that well to XML parsing. Just thought I'd raise the question. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From tomoakin at kenroku.kanazawa-u.ac.jp Sun Jul 5 11:28:33 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Sun, 5 Jul 2009 20:28:33 +0900 Subject: [BioRuby] SIM4 parser In-Reply-To: <5510B566-E723-4AEE-8DEC-63BE1ABD9F19@kenroku.kanazawa-u.ac.jp> References: <5510B566-E723-4AEE-8DEC-63BE1ABD9F19@kenroku.kanazawa-u.ac.jp> Message-ID: <0C3F8576-899A-426E-869A-C9DCF8F47868@kenroku.kanazawa-u.ac.jp> Hi, > A way to resolve may to check if the start address match the > address that > was specified in the previous section stating the ranges of the > matches. > I'm considering implementing this way. A working code is obtained and a diff relative to 1.3.0 is attached. The code was changed to parse alignment only after the SegemntPairs are prepared During this work, I also noticed that the semantics of the structure might be misunderstood: 1. The mark after the match, either "->", "<-", "--", or "==" does not represent the direction of the exon, but indicates the presumed direction of the intron following the exon. "--" corresponds in case part of the intervening sequence and midline is shown and "==" is for cases without information for intervening sequence. I do not understand how these patterns are determined by SIM4, but "->" and "<-" can be estimated based on GU-AG rule. Since these directions are essentially assigned to the introns rather than exons, it might be inappropriate to assign these strings to the exon. There is actually rare cases that introns in different direction is deduced: in such case assuming the direction of the exon is same as the 3' intron rather than 5' intron of the exon is not desired. So, it seems arguable to make directions for exon deprecated. From current state of the parser, I bet there are few people using bioruby to parse sim4 alignment output, and changing the interface is acceptable this time. -------------- next part -------------- -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan From chen_li3 at yahoo.com Sun Jul 5 18:39:43 2009 From: chen_li3 at yahoo.com (chen li) Date: Sun, 5 Jul 2009 11:39:43 -0700 (PDT) Subject: [BioRuby] how to set up a local BLAST and run it Message-ID: <336766.99689.qm@web36808.mail.mud.yahoo.com> Hi all, I want to run a local BLAST against an EST database from NCBI. I can't find the tutorial on Bioruby for it. I wonder if anyone out there how to set up within BioRuby. Thank you very much, Li From kwicher at gmail.com Mon Jul 6 16:57:17 2009 From: kwicher at gmail.com (Krzysztof B. Wicher) Date: Mon, 6 Jul 2009 17:57:17 +0100 Subject: [BioRuby] how to set up a local BLAST and run it Message-ID: Hi, I am not sure at which point you are but that what I have done: - download blast executable - download database in fasta format - format database using formatdb When you are done you are ready to run the blast locally. What I do next I simply execute blast program from within Ruby script and parse the output file. If you need more detials. I can send you the example script. Cheers K On Mon, Jul 6, 2009 at 5:00 PM, wrote: > Send BioRuby mailing list submissions to > ? ? ? ?bioruby at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > ? ? ? ?http://lists.open-bio.org/mailman/listinfo/bioruby > or, via email, send a message with subject or body 'help' to > ? ? ? ?bioruby-request at lists.open-bio.org > > You can reach the person managing the list at > ? ? ? ?bioruby-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of BioRuby digest..." > > > Today's Topics: > > ? 1. how to set up a local BLAST and run it (chen li) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 5 Jul 2009 11:39:43 -0700 (PDT) > From: chen li > Subject: [BioRuby] how to set up a local BLAST and run it > To: bioruby at lists.open-bio.org > Message-ID: <336766.99689.qm at web36808.mail.mud.yahoo.com> > Content-Type: text/plain; charset=us-ascii > > > Hi all, > > I want to run a local BLAST against an EST database from NCBI. I can't find the tutorial on Bioruby for it. I wonder if anyone out there how to set up within BioRuby. > > > Thank you very much, > > Li > > > > > > ------------------------------ > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > > > End of BioRuby Digest, Vol 46, Issue 4 > ************************************** > From john.woods at marcottelab.org Tue Jul 7 22:20:27 2009 From: john.woods at marcottelab.org (John O. Woods) Date: Tue, 7 Jul 2009 17:20:27 -0500 Subject: [BioRuby] FlyBase/Chado Message-ID: <91656c3f0907071520t56d13795l56aab2542fe832d3@mail.gmail.com> A few months back I wrote some Perl scripts to extract some data from FlyBase's Chado DB. Then I discovered Ruby. Fast-forward to now, and I'm working on a Rails app that will index certain kinds of data (gene-phenotype linkages, mostly). I would want it to download the data from FlyBase's postgresql database and stick it in my local MySQL db. Is there a BioRuby module written for Chado or perhaps even for FlyBase? If not, where would I start if I wanted to write one? I'm a bit of a ruby-newbie, but I'd like to contribute something if possible. Am I better off just using my Perl-generated flat-files? I did look through the rdoc stuff on the website, but couldn't find anything about Chado. Cheers, John -- The University of Texas at Austin From kwicher at gmail.com Wed Jul 8 18:14:25 2009 From: kwicher at gmail.com (Krzysztof B. Wicher) Date: Wed, 8 Jul 2009 19:14:25 +0100 Subject: [BioRuby] FlyBase/Chado Message-ID: I was looking for something like that as well and I have not found anything. I was to start writing myself module to query postgresql using e.g. Sequel ... unfortunatelly, never had time to do it. Sorry for not being more helpful K > Message: 1 > Date: Tue, 7 Jul 2009 17:20:27 -0500 > From: "John O. Woods" > Subject: [BioRuby] FlyBase/Chado > To: bioruby at lists.open-bio.org > Message-ID: > ? ? ? ?<91656c3f0907071520t56d13795l56aab2542fe832d3 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > A few months back I wrote some Perl scripts to extract some data from > FlyBase's Chado DB. Then I discovered Ruby. Fast-forward to now, and I'm > working on a Rails app that will index certain kinds of data (gene-phenotype > linkages, mostly). I would want it to download the data from FlyBase's > postgresql database and stick it in my local MySQL db. > Is there a BioRuby module written for Chado or perhaps even for FlyBase? > > If not, where would I start if I wanted to write one? I'm a bit of a > ruby-newbie, but I'd like to contribute something if possible. > > Am I better off just using my Perl-generated flat-files? > > I did look through the rdoc stuff on the website, but couldn't find anything > about Chado. > > Cheers, > John > -- > The University of Texas at Austin > > > ------------------------------ > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > > > End of BioRuby Digest, Vol 46, Issue 6 > ************************************** > From rozziite at gmail.com Sun Jul 12 02:18:12 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Sat, 11 Jul 2009 22:18:12 -0400 Subject: [BioRuby] Bio::Tree#children IndexError when called on a root node which does not have children. Message-ID: <4057d3bf0907111918u44d5e5d1v9b5e0ea6241e8f22@mail.gmail.com> Hi all, While i was writing parser for Phyloxml I discovered such behavior in Bio::Tree class. Bio::Tree#children method gives IndexError if it is called on a root node which does not have children. irb(main):002:0> tree = Bio::Tree.new => #, @options={}> irb(main):004:0> node = Bio::Tree::Node.new => (Node:b7c43f80) irb(main):005:0> node.name = "node1" => "node1" irb(main):006:0> tree.root = node => (Node:"node1") irb(main):007:0> tree.children(tree.root) IndexError: node1 not found from /usr/local/lib/site_ruby/1.8/bio/tree.rb:591:in `path' from /usr/local/lib/site_ruby/1.8/bio/tree.rb:640:in `children' from (irb):7 irb(main):008:0> If the children method is called on other than root node (which does not have children), then it correctly gives empty array: irb(main):008:0> node2 = Bio::Tree::Node.new => (Node:b7c3b088) irb(main):009:0> node2.name = "node2" => "node2" irb(main):010:0> tree.add_node(node2) => #{}}, @relations=[], @label={}, @undirected=true, @index={}>, @options={}> irb(main):011:0> tree.add_edge(tree.root, node2) => irb(main):012:0> tree.children(node2) => [] irb(main):013:0> If the root node has children, then everything is fine: irb(main):013:0> tree.children(tree.root) => [(Node:"node2")] Diana From ngoto at gen-info.osaka-u.ac.jp Sun Jul 12 08:46:26 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 12 Jul 2009 17:46:26 +0900 Subject: [BioRuby] Bio::Tree#children IndexError when called on a root node which does not have children. In-Reply-To: <4057d3bf0907111918u44d5e5d1v9b5e0ea6241e8f22@mail.gmail.com> References: <4057d3bf0907111918u44d5e5d1v9b5e0ea6241e8f22@mail.gmail.com> Message-ID: <20090712084627.00DDE1CBC3BC@idnmail.gen-info.osaka-u.ac.jp> Hi Diana, On Sat, 11 Jul 2009 22:18:12 -0400 Diana Jaunzeikare wrote: > Hi all, > > While i was writing parser for Phyloxml I discovered such behavior in > Bio::Tree class. > > Bio::Tree#children method gives IndexError if it is called on a root node > which does not have children. > > irb(main):002:0> tree = Bio::Tree.new > => # @graph={}, @relations=[], @label={}, @undirected=true, @index={}>, > @options={}> > irb(main):004:0> node = Bio::Tree::Node.new > => (Node:b7c43f80) > irb(main):005:0> node.name = "node1" > => "node1" > irb(main):006:0> tree.root = node > => (Node:"node1") > irb(main):007:0> tree.children(tree.root) > IndexError: node1 not found > from /usr/local/lib/site_ruby/1.8/bio/tree.rb:591:in `path' > from /usr/local/lib/site_ruby/1.8/bio/tree.rb:640:in `children' > from (irb):7 > irb(main):008:0> The error shows that "tree.root" does not exist in the tree. Currently, Bio::Tree#root=(node) does not check whether the specified node exists in the tree or not, and it changes only the internal pointer to the root. In addition, it does not modify the tree except the pointer to the root. In this case, the node should be added to the tree. Before "tree.root = node" or "tree.children(tree.root)", tree.add_node(node) is needed. Why the latter case works is that Bio::Tree#add_edge automatically adds nodes if the nodes do not exist in the tree. > > > If the children method is called on other than root node (which does not > have children), then it correctly gives empty array: > > irb(main):008:0> node2 = Bio::Tree::Node.new > => (Node:b7c3b088) > irb(main):009:0> node2.name = "node2" > => "node2" > irb(main):010:0> tree.add_node(node2) > => # @pathway=#{}}, > @relations=[], @label={}, @undirected=true, @index={}>, @options={}> > irb(main):011:0> tree.add_edge(tree.root, node2) > => > irb(main):012:0> tree.children(node2) > => [] > irb(main):013:0> > > If the root node has children, then everything is fine: > > irb(main):013:0> tree.children(tree.root) > => [(Node:"node2")] > > > Diana > -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From rozziite at gmail.com Sun Jul 12 22:11:56 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Sun, 12 Jul 2009 18:11:56 -0400 Subject: [BioRuby] Bio::Tree#children IndexError when called on a root node which does not have children. In-Reply-To: <20090712084627.00DDE1CBC3BC@idnmail.gen-info.osaka-u.ac.jp> References: <4057d3bf0907111918u44d5e5d1v9b5e0ea6241e8f22@mail.gmail.com> <20090712084627.00DDE1CBC3BC@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4057d3bf0907121511k526a573fqf075bd9ca5ad7f25@mail.gmail.com> Thanks for explanation, this makes sense now. Diana On Sun, Jul 12, 2009 at 4:46 AM, Naohisa GOTO wrote: > Hi Diana, > > On Sat, 11 Jul 2009 22:18:12 -0400 > Diana Jaunzeikare wrote: > > > Hi all, > > > > While i was writing parser for Phyloxml I discovered such behavior in > > Bio::Tree class. > > > > Bio::Tree#children method gives IndexError if it is called on a root node > > which does not have children. > > > > irb(main):002:0> tree = Bio::Tree.new > > => # > @graph={}, @relations=[], @label={}, @undirected=true, @index={}>, > > @options={}> > > irb(main):004:0> node = Bio::Tree::Node.new > > => (Node:b7c43f80) > > irb(main):005:0> node.name = "node1" > > => "node1" > > irb(main):006:0> tree.root = node > > => (Node:"node1") > > irb(main):007:0> tree.children(tree.root) > > IndexError: node1 not found > > from /usr/local/lib/site_ruby/1.8/bio/tree.rb:591:in `path' > > from /usr/local/lib/site_ruby/1.8/bio/tree.rb:640:in `children' > > from (irb):7 > > irb(main):008:0> > > The error shows that "tree.root" does not exist in the tree. > > Currently, Bio::Tree#root=(node) does not check whether > the specified node exists in the tree or not, and it changes > only the internal pointer to the root. In addition, it does > not modify the tree except the pointer to the root. > > In this case, the node should be added to the tree. > > Before "tree.root = node" or "tree.children(tree.root)", > tree.add_node(node) > is needed. > > Why the latter case works is that Bio::Tree#add_edge > automatically adds nodes if the nodes do not exist in the tree. > > > > > > > If the children method is called on other than root node (which does not > > have children), then it correctly gives empty array: > > > > irb(main):008:0> node2 = Bio::Tree::Node.new > > => (Node:b7c3b088) > > irb(main):009:0> node2.name = "node2" > > => "node2" > > irb(main):010:0> tree.add_node(node2) > > => # > @pathway=#{}}, > > @relations=[], @label={}, @undirected=true, @index={}>, @options={}> > > irb(main):011:0> tree.add_edge(tree.root, node2) > > => > > irb(main):012:0> tree.children(node2) > > => [] > > irb(main):013:0> > > > > If the root node has children, then everything is fine: > > > > irb(main):013:0> tree.children(tree.root) > > => [(Node:"node2")] > > > > > > Diana > > > > > -- > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > From donttrustben at gmail.com Mon Jul 20 03:52:09 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Mon, 20 Jul 2009 13:52:09 +1000 Subject: [BioRuby] newick gsub(' ','_') Message-ID: Hello. I'm attempting to put spaces in the leaf nodes of a phylogenetic tree, and the bioruby newick writer is replacing them with underscores - not my desired behaviour. I believe the gsub I'm talking about is on line 58 of http://github.com/bioruby/bioruby/blob/97b9284109c9a4431b92eab208509e1df6069b4b/lib/bio/db/newick.rb If a leaf node name has spaces in it, should it then be surrounded with a single quote? Am I not understanding something about the newick format? Thanks in advance, ben -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From tomoakin at kenroku.kanazawa-u.ac.jp Mon Jul 20 05:19:30 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Mon, 20 Jul 2009 14:19:30 +0900 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: References: Message-ID: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> Hi, According to the specification of NEWICK at http://evolution.genetics.washington.edu/phylip/newick_doc.html SPACE in quoted string and underscore are regarded to be identical. In the note it reads "Underscore characters in unquoted labels are converted to blanks. " OTU label in MacClade and PAUP behaves similarly. So, surrounding with single quote or replacing space with underscore are both conforming representation. -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan From ngoto at gen-info.osaka-u.ac.jp Mon Jul 20 07:50:09 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Mon, 20 Jul 2009 16:50:09 +0900 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> Message-ID: <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> Hi, > Hi, > > According to the specification of NEWICK at > > http://evolution.genetics.washington.edu/phylip/newick_doc.html > > SPACE in quoted string and underscore are regarded to be > identical. > > In the note it reads > "Underscore characters in unquoted labels are converted to blanks. " > > OTU label in MacClade and PAUP behaves similarly. > So, surrounding with single quote or replacing space with underscore > are both conforming representation. Newick formatter in BioRuby converts spaces in a label if the label can be treated as "unquoted labels" i.e. it consists of only alphabets, numbers and/or spaces. I believe the behavior is right, although I know some software ignore the underscore rule. When parsing Newick format, giving :parser => :naive option to Bio::Newick.new() can prevent any label character conversion, but no option for the output, because I think genarating broken format is generally a bad thing. Note that the behavior has been changed in BioRuby 1.2.0. Before 1.1.x, it did not care anything about label characters. > -- > Tomoaki NISHIYAMA > > Advanced Science Research Center, > Kanazawa University, > 13-1 Takara-machi, > Kanazawa, 920-0934, Japan > Thank you. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From donttrustben at gmail.com Tue Jul 21 01:19:21 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 21 Jul 2009 11:19:21 +1000 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: Hi. 2009/7/20 Naohisa Goto > > I believe the behavior is right, although I know some software > ignore the underscore rule. When parsing Newick format, giving > :parser => :naive option to Bio::Newick.new() can prevent any > label character conversion, but no option for the output, because > I think genarating broken format is generally a bad thing. OK, thanks - that makes sense. The particular program I'm using, figtree, understands underscores as spaces, and so I never really had any problem in the first place. But to be academic I don't actually agree that the specification says blanks should be converted underscores in otherwise unquoted strings - I think quoting them is equally valid (and possibly supported by more programs), but maybe that's just me. Happy to accept the community judgement on that one. Would it make sense to allow quoting to be forced by the user? I don't see anything in the specification that is against that, so long as everything inside is properly escaped. Thanks, ben From yannick.wurm at unil.ch Mon Jul 20 16:43:07 2009 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Mon, 20 Jul 2009 18:43:07 +0200 Subject: [BioRuby] paper In-Reply-To: References: Message-ID: <93E492C4-F9C4-419D-941F-82376BF7DBAF@unil.ch> Hello, congrats on to Jan Aerts and Andy Law on getting some (much needed) visibility for ruby in BMC Bioinformatics! http://www.biomedcentral.com/1471-2105/10/221 cheers, yannick From ngoto at gen-info.osaka-u.ac.jp Tue Jul 21 03:25:10 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 21 Jul 2009 12:25:10 +0900 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 21 Jul 2009 11:19:21 +1000 Ben Woodcroft wrote: > OK, thanks - that makes sense. The particular program I'm using, figtree, > understands underscores as spaces, and so I never really had any problem in > the first place. > > But to be academic I don't actually agree that the specification says blanks > should be converted underscores in otherwise unquoted strings - I think > quoting them is equally valid (and possibly supported by more programs), but > maybe that's just me. Happy to accept the community judgement on that one. > > Would it make sense to allow quoting to be forced by the user? I don't see > anything in the specification that is against that, so long as everything > inside is properly escaped. Why unquoted labels are preferred is that naive programs that cannot understand quotes might have problems with quotes and speces. Providing unquoted labels as many as possible may reduce such problems, though it isn't perfect. Programs that can understand quoted labels should also be aware of the underscore rule in unquoted labels, and theoretically no problem with unquoted labels. However, if you know any strange programs that can parse quoted labels but can not understand underscores in unquoted labels, please tell us. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From donttrustben at gmail.com Tue Jul 21 06:28:48 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 21 Jul 2009 16:28:48 +1000 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Ok, if I come across something I'll tell you. Thanks, ben 2009/7/21 Naohisa GOTO > Hi, > > On Tue, 21 Jul 2009 11:19:21 +1000 > Ben Woodcroft wrote: > > > OK, thanks - that makes sense. The particular program I'm using, figtree, > > understands underscores as spaces, and so I never really had any problem > in > > the first place. > > > > But to be academic I don't actually agree that the specification says > blanks > > should be converted underscores in otherwise unquoted strings - I think > > quoting them is equally valid (and possibly supported by more programs), > but > > maybe that's just me. Happy to accept the community judgement on that > one. > > > > Would it make sense to allow quoting to be forced by the user? I don't > see > > anything in the specification that is against that, so long as everything > > inside is properly escaped. > > Why unquoted labels are preferred is that naive programs > that cannot understand quotes might have problems with > quotes and speces. Providing unquoted labels as many as > possible may reduce such problems, though it isn't perfect. > > Programs that can understand quoted labels should also be > aware of the underscore rule in unquoted labels, and > theoretically no problem with unquoted labels. However, > if you know any strange programs that can parse quoted labels > but can not understand underscores in unquoted labels, > please tell us. > > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From jan.aerts at gmail.com Tue Jul 21 08:08:43 2009 From: jan.aerts at gmail.com (Jan Aerts) Date: Tue, 21 Jul 2009 09:08:43 +0100 Subject: [BioRuby] paper In-Reply-To: <93E492C4-F9C4-419D-941F-82376BF7DBAF@unil.ch> References: <93E492C4-F9C4-419D-941F-82376BF7DBAF@unil.ch> Message-ID: <4c7507a70907210108q564255a1h20dbefd65adbe40d@mail.gmail.com> Thanks! jan. 2009/7/20 Yannick Wurm > Hello, > > congrats on to Jan Aerts and Andy Law on getting some (much needed) > visibility for ruby in BMC Bioinformatics! > http://www.biomedcentral.com/1471-2105/10/221 > cheers, > yannick > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From czmasek at burnham.org Tue Jul 21 18:34:03 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Tue, 21 Jul 2009 11:34:03 -0700 Subject: [BioRuby] newick gsub(' ','_') In-Reply-To: <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> References: <0644F0E9-E667-4202-BE6A-49B057E818B9@kenroku.kanazawa-u.ac.jp> <20090720164535.1830.EEF6E030@gen-info.osaka-u.ac.jp> <20090721032512.426371CBC441@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4A660A1B.3060108@burnham.org> > Why unquoted labels are preferred is that naive programs > that cannot understand quotes might have problems with > quotes and speces. Providing unquoted labels as many as > possible may reduce such problems, though it isn't perfect. Indeed! Many programs related to phylogentic analysis are quite picky when it comes to names. Having spaces and/or quotes in names is likely to lead to compatibility problems in some (many?) programs and is therefore best avoided. Christian Zmasek http://monochrome-effect.net/ From pmr at ebi.ac.uk Mon Jul 27 08:55:43 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 27 Jul 2009 09:55:43 +0100 Subject: [BioRuby] Open-bio cross-project issues In-Reply-To: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> Message-ID: <4A6D6B8F.9060108@ebi.ac.uk> Peter C. wrote (to bioperl-l, biopython-l, emboss-dev): > Hi all, > > Peter Rice kindly said he will look into an OBF cross project mailing > list, but in the meantime this has been cross posted to the Biopython, > BioPerl, and EMBOSS development lists. There is a list already for this purpose - open-bio-l I think we will also need a cross-project wiki space on the OBF site. Is there something already used by other projects or should we set something up? I am cross-posting this to other OBF project lists to encourage developers interested in combining efforts to address common problems. This started with FASTQ short read formats, and open-bio-l (a low volume list) has also seen discussion of common test data sets. Please sign up to open-bio-l (if you are not there already) and post suggestions for cross-project issues there. The list subscription page is: http://lists.open-bio.org/mailman/listinfo/open-bio-l Please feel free to forward this to any other projects I may have missed (I picked the obvious addresses from the list.open-bio-org server) regards, Peter Rice From rozziite at gmail.com Thu Jul 30 22:04:59 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Thu, 30 Jul 2009 18:04:59 -0400 Subject: [BioRuby] how to convert sequence to fasta format with header information? Message-ID: <4057d3bf0907301504i21d74c8dk6cccf6833476dcd6@mail.gmail.com> Hi all, I want to retrieve sequence from a pdb file and save it in fasta format where* header holds the pdb entry id*. This is how I did it: file = File.new('1OOP.pdb').gets(nil) structure = Bio::PDB.new(file) seq = structure.seqres['A'] puts seq.to_fasta("1OOP", 70) it works and produces result i want: #>1OOP #GPPGEVMGRAIARVADTIGSGPVNSESIPALTAAETGHTSQVVPSDTMQTRHVKNYHSRSESTVENFLCR #SACVFYTTYENHDSDGDNFAYWVINTRQVAQLRRKLEMFTYARFDLELTFVITSTQEQPTVRGQDAPVLT #HQIMYVPPGGPVPTKVNSYSWQTSTNPSVFWTEGSAPPRMSVPFIGIGNAYSMFYDGWARFDKQGTYGIS #TLNNMGTLYMRHVNDGGPGPIVSTVRIYFKPKHVKTWVPRPPRLCQYQKAGNVNFEPTGVTEGRTDITTM #KTT However, according to documation Bio::Sequence::Common#to_fasta is a deprecated method and it suggests to use Bio::Sequence#output, but when I modify code to puts seq.output(:fasta) it gives error that method is not defined. Also I don't see a way how to define the header. What should i use in place of the deprecated to_fasta method? Thanks, Diana