From p.j.a.cock at googlemail.com Tue May 3 05:24:08 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 May 2011 10:24:08 +0100 Subject: [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: References: Message-ID: Hello all, I've CC'd the BioPerl, BioRuby, BioJava and Biopython development mailing lists to make sure you're aware of this, but can we continue any discussion on the cross-project open-bio-l mailing list please? I noticed that recent versions of BLAST are not using a single block for each query, which was the historical behaviour and assumed by the Biopython BLAST XML parser. This may be a bug in BLAST. See link below for an example. Has anyone else noticed this, and has it been reported to the NCBI yet? Thanks, Peter (Not for the first time, I wish there was a public bug tracker for BLAST, or at least a private bug tracker so we could talk about issues with an NCBI assigned reference number.) ---------- Forwarded message ---------- From: Peter Cock Date: Wed, Apr 20, 2011 at 6:08 PM Subject: Interesting BLAST 2.2.25+ XML behaviour To: Biopython-Dev Mailing List Hi all, Have a look at this XML file from a FASTA vs FASTA search using blastp from ?BLAST 2.2.25+ (current release), which is a test file I created for the BLAST+ wrappers in Galaxy: https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml I just put it though the Biopython BLAST XML parser, and was surprised not to get four records back (since as you might guess from the filename, there were four queries). It appears this version of BLAST+ is incrementing the iteration counter for each match... or something like that. Has anyone else noticed this? I wonder if it is accidental... Peter From cjfields at illinois.edu Tue May 3 09:31:55 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 May 2011 08:31:55 -0500 Subject: [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: References: Message-ID: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> Haven't tried this using the latest BLAST+ myself, but it doesn't surprise me too much. Also agree re: some kind of bug tracking with NCBI; I believe they have an internal one, but it would be nice to have a public interface to it. chris On May 3, 2011, at 4:24 AM, Peter Cock wrote: > Hello all, > > I've CC'd the BioPerl, BioRuby, BioJava and Biopython development mailing > lists to make sure you're aware of this, but can we continue any discussion > on the cross-project open-bio-l mailing list please? > > I noticed that recent versions of BLAST are not using a single > block for each query, which was the historical behaviour and assumed > by the Biopython BLAST XML parser. This may be a bug in BLAST. > See link below for an example. > > Has anyone else noticed this, and has it been reported to the NCBI yet? > > Thanks, > > Peter > > (Not for the first time, I wish there was a public bug tracker for BLAST, > or at least a private bug tracker so we could talk about issues with an > NCBI assigned reference number.) > > ---------- Forwarded message ---------- > From: Peter Cock > Date: Wed, Apr 20, 2011 at 6:08 PM > Subject: Interesting BLAST 2.2.25+ XML behaviour > To: Biopython-Dev Mailing List > > > Hi all, > > Have a look at this XML file from a FASTA vs FASTA search > using blastp from BLAST 2.2.25+ (current release), which > is a test file I created for the BLAST+ wrappers in Galaxy: > > https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml > > I just put it though the Biopython BLAST XML parser, and > was surprised not to get four records back (since as you > might guess from the filename, there were four queries). > > It appears this version of BLAST+ is incrementing the > iteration counter for each match... or something like that. > > Has anyone else noticed this? I wonder if it is accidental... > > Peter > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Wed May 4 07:51:51 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 4 May 2011 13:51:51 +0200 Subject: [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> References: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> Message-ID: <20110504115151.GA12003@thebird.nl> Something also rather odd. Can you imagine that a basic BLAST lesson would make it into PLoS Biology? PLoS Biology states: the journal features works of exceptional significance, originality, and relevance in all areas of biological science, from molecules to ecosystems, including works at the interface of other disciplines, such as chemistry, medicine, and mathematics. Well, so much for that. Check out a copy of the BLAST help page, as published in PLoS Biology: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3032543/ I checked the date. It is not April 1st. It ticks, however, the relevance box. But I am not sure about its significance. It is certainly not original. Where are we heading? Can we start submitting man pages now? Pj. On Tue, May 03, 2011 at 08:31:55AM -0500, Chris Fields wrote: > Haven't tried this using the latest BLAST+ myself, but it doesn't surprise me too much. Also agree re: some kind of bug tracking with NCBI; I believe they have an internal one, but it would be nice to have a public interface to it. > > chris > > On May 3, 2011, at 4:24 AM, Peter Cock wrote: > > > Hello all, > > > > I've CC'd the BioPerl, BioRuby, BioJava and Biopython development mailing > > lists to make sure you're aware of this, but can we continue any discussion > > on the cross-project open-bio-l mailing list please? > > > > I noticed that recent versions of BLAST are not using a single > > block for each query, which was the historical behaviour and assumed > > by the Biopython BLAST XML parser. This may be a bug in BLAST. > > See link below for an example. > > > > Has anyone else noticed this, and has it been reported to the NCBI yet? > > > > Thanks, > > > > Peter > > > > (Not for the first time, I wish there was a public bug tracker for BLAST, > > or at least a private bug tracker so we could talk about issues with an > > NCBI assigned reference number.) > > > > ---------- Forwarded message ---------- > > From: Peter Cock > > Date: Wed, Apr 20, 2011 at 6:08 PM > > Subject: Interesting BLAST 2.2.25+ XML behaviour > > To: Biopython-Dev Mailing List > > > > > > Hi all, > > > > Have a look at this XML file from a FASTA vs FASTA search > > using blastp from BLAST 2.2.25+ (current release), which > > is a test file I created for the BLAST+ wrappers in Galaxy: > > > > https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml > > > > I just put it though the Biopython BLAST XML parser, and > > was surprised not to get four records back (since as you > > might guess from the filename, there were four queries). > > > > It appears this version of BLAST+ is incrementing the > > iteration counter for each match... or something like that. > > > > Has anyone else noticed this? I wonder if it is accidental... > > > > Peter > > > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From rutgeraldo at gmail.com Tue May 10 09:43:22 2011 From: rutgeraldo at gmail.com (Rutger Vos) Date: Tue, 10 May 2011 14:43:22 +0100 Subject: [BioRuby] Announcement: Registration open for Computational Phyloinformatics course Message-ID: COMPUTATIONAL PHYLOINFORMATICS August 1 2011 through August 11 2011 Bioinformatics Center of Kyoto University Application Deadline: May 31, 2011 http://academy.nescent.org/wiki/Computational_phyloinformatics Computational Phyloinformatics is an 11-day international course (August 1-11, 2011) co-organized by the Computational Biology Research Center (CBRC/AIST), the Bioinformatics Center of Kyoto University, the Database Center for Life Science (DBCLS/JST), and the National Evolutionary Synthesis Center (NESCent). This course, which will take place at Kyoto University directly following the SMBE Meeting (http://smbe2011.com/), aims to give participants practical knowledge and hands-on skills in phyloinformatics. The venue in Kyoto is completely unaffected by the unfortunate events in Fukushima and the power shortages in Tokyo. We encourage biologists from other countries to participate in the SMBE meeting and/or this special international course, in solidarity with the scientific community of Japan in their effort to return to normalcy and to help minimize any negative impacts that the earthquake may have on scientific activities in Japan. SYNOPSIS Biologists are faced with ever-larger datasets, more complex evolutionary models, and increasingly elaborate analytical methods. Seldom is it sufficient to run a dataset with an off-the-shelf program on a desktop PC; increasingly, biologists need to write scripts to interface with internet services and databases, build analytical pipelines, customize analyses, and distribute computation over multiple processors. This course is designed for graduate students, postdocs, faculty, and researchers in phylogenetics interested in receiving practical, hands-on training in the use of Perl and SQL for workflows and applications in phyloinformatics. The course is divided into four parts: PART I: A tutorial review of Perl, including object oriented programming and building packages. PART II: Introduction and practical use of BioPerl and Bio::Phylo, (e.g. scripting for large tree inference engines, automating model testing, genomic-scale data mining and acquisition, supertree assembly, rate smoothing and branch calibration, tree traversal, etc). PART III: Introduction and practical use of BioRuby for molecular evolution and functional genomics (e.g. scripting multiple sequence alignment, gene duplication inference, tree inference, etc.). PART IV: Introduction to SQL and database design; computing and querying nested sets and transitive closure; querying both large trees (e.g. NCBI) and large collections of trees (e.g. TreeBASE). Participants will learn how to write basic phylogenetic or comparative analysis scripts, parse NEXUS files, traverse and compute over trees, and make practical use of phylogenetic software libraries. These skills will be learned in a biological context, touching on a diverse array of topics such as analysis of large datasets, automation of supertree assembly, querying for topological patterns in large collections of trees, etc. Participants will leave the course with a full set of installations and libraries on their computer ready to build phyloinformatic workflows for their own research projects, as well as continued access to a 50+ page wiki "textbook" containing step-by-step instructions, problem sets, and examples. INSTRUCTORS AND COURSE ORGANIZERS Christian Zmasek, Karen Cranston, Rutger A. Vos, Susumu Goto, Toshiaki Katayama, William H. Piel APPLICATION DEADLINE May 31, 2011 TUITION ?40,000 (~$500) Participants are responsible for their own travel costs, including transportation and accommodation -- see the website for more information. International participants will benefit by combining attendance with the 2011 SMBE meeting. A limited number of travel scholarships from NESCent are available for US-based students. Preference will be given to students from under-represented minorities. SUBSIDIES AND SCHOLARSHIPS A limited number of travel scholarships from NESCent are available for US-based students. Preference will be given to students from under-represented minorities. The Asia-Pacific Bioinformatics Network (APBioNet) is happy to provide travel assistance for a limited number of students/early career researchers from the Asia-Pacific region. Applicants are requested to contact Dr Asif Khan, APBioNet Secretariat: asif -$- bic.nus.edu.sg (replace -$- with @) for details. PREREQUISITES BIOLOGY: A good understanding of phylogenetics ? for example, having already taken the Workshop on Molecular Evolution (http://www.molecularevolution.org/) or equivalent coursework or experience. COMPUTING: Prior experience with Perl or careful study of the suggested reading materials in advance of the class (see web site). Participants should have some experience with basic Unix shell commands. EQUIPMENT: Participants are expected to bring their own Mac OSX computer or a LINUX computer, else they will be provided with an iMac. Participants who cannot bring their own computer and will be using a supplied iMac, should consider bringing their own portable firewire/usb drive so that they can also leave the course with a full suite of phyloinformatic software tools. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From bonnal at ingm.org Thu May 12 07:42:39 2011 From: bonnal at ingm.org (Raoul Bonnal) Date: Thu, 12 May 2011 13:42:39 +0200 Subject: [BioRuby] Update bio-ngs, bio-gem and GSoC Message-ID: <9738531A-A9E5-4BCD-BC4F-76EAC98BE86D@ingm.org> Dear All, this is an update on our activities biongs, biogem and GSoC As you already know, BioRuby has one project accepted: bio-objects http://bioruby.open-bio.org/wiki/Google_Summer_of_Code#Represent_bio-objects_and_related_information_with_images, Michal is the student assigned to this project. The other candidate student Ales wants to work on his proposal bioruby wrapper http://bioruby.open-bio.org/wiki/Google_Summer_of_Code#BioRuby_Wrapper_for_Command_line_application To optimize the time I think that we can fix a weekly meeting before or after the thursday BioRuby IRC meeting writing directly into bioruby's channel, obviously the students can write on ml and contact the mentors by irc,skype,ml etc... I think that this approach is useful also for other bioruby devs to be up to date and take part in the GSoC, because an idea/opinion is better than nothing, so your contribute is appreciated. bio-gem: * I added the possibility to create an embedded database when you create a gem, some code has been borrowed from Rails and adapted to our needs https://github.com/helios/bioruby-gem/tree/db_tasks I'll merge this branch (db_tasks) in master, next week Docs improved. * rails_engines is working but not yet in master bio-ngs & related subproject (bwa, samtools) * wrapper is more robust now * samtools now can be used directly from the binding https://github.com/helios/bioruby-samtools or from the wrapper. Why ? Because not all the functionalities have been bound, some time it's too complicated (see merge) So I wrote the wrapper https://github.com/helios/bioruby-ngs/blob/master/lib/bio/appl/ngs/samtools.rb ** samtools now is not shipped with precompiled library anymore, now it downloads and compiles the l library for the hist OS during the gem install ... I think that when possible we'll follow this approach. ** ricardo and dan agreed with us to have a common repository, working in that direction... * Francesco did a great job with bwa and implementing Homology and Ontology classes and tasks to work with homology searches (i.e. blast) and Gene Ontology datasets for annotations and functional analysis. All these classes work with a dedicated database to store and manipulate the data. Our abstract has been accepted for a BOSC talk, we are very happy because BioRuby will the @BOSC another time, it's a great chance to meet you and share thoughts with guys from other bio* projects. You can download the abstract here: http://dl.dropbox.com/u/16636340/bioruby-ngs_BOSC-2011-FS-TKTYM-13-04-11-Final.pdf They asked to talk about biogem and parallel processing as well :-) As above great chance to share ideas and experience. We'll be @ISMB as well with 2 posters (actually we'll have a poster @BOSC too) In these days I'm working on implementing a bio-gex class to handle gene expression datasets, it uses statsample (Claudio) to handle matrixes etc.., mostly coming from rnaseq ngs(priority) and from rtpcr, https://github.com/helios/bioruby-gex. It's an experiment but I think it could be useful, it's not a R replacement. The repo: https://github.com/helios/bioruby-gex TODO * write a lot of test on biongs * refactor * generalize some taks because we are developing them from our day by day work so it inevitably that we (mostly me, because Francesco is more precise than me :-) ) write biased code * .... Just a reminder for every one, if you are going to create a new gem for bioruby please try to follow bioruby's original namespace this will help core developers to integrate plugin's code into the main repository if needed. Cheers. -- The only change to succeed is starting from a simple thing. From philipp.comans at googlemail.com Fri May 13 11:50:06 2011 From: philipp.comans at googlemail.com (Philipp Comans) Date: Fri, 13 May 2011 17:50:06 +0200 Subject: [BioRuby] Discontiguous Megablast from BioRuby Message-ID: <46DBDF1DDBBB429D9372E3D815A15F72@googlemail.com> Hi everyone, I would like to perform a discontiguous megablast against a local blast database using Bio::Blast. As I understand, BioRuby 1.4.1 uses blastall from the legacy blast toolkit. This version does not support dc-megablast although I could be mistaken. Is it possible to perform a dc-megablast from BioRuby? Is there a way to call blastn from the new blast+ toolkit? Any help would be greatly appreciated! Best, Philipp From email2ants at gmail.com Mon May 16 05:07:47 2011 From: email2ants at gmail.com (Anthony Underwood) Date: Mon, 16 May 2011 10:07:47 +0100 Subject: [BioRuby] Bio::Sequence and Bio::Sequence::NA Message-ID: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> Here's something that has alway puzzled me about BioRuby If I start with a Bio::EMBL object and want to extract the features I can do the following biosequence = embl_object.to_biosequence This returns an instance of a Bio::Sequence class. I can now access the features features = biosequence.features However if the sequence is nucleotide and I want to translate it I have to do the following biosequence.na OR biosequence = biosequence.auto This returns a Bio::Sequence::NA instance and I can now translate protein = biosequence.translate(1,11) Why can I not now get at the features biosequence.features #=> undefined method `features' for # I would have though that after converting to Bio::Sequence::NA or Bio::Sequence:AA the methods available to Bio::Sequence should still be available. Can anyone tell me what's going on here. Is there another method I should use? Thanks Anthony From ktym at hgc.jp Mon May 16 21:33:41 2011 From: ktym at hgc.jp (Toshiaki Katayama) Date: Tue, 17 May 2011 10:33:41 +0900 Subject: [BioRuby] Bio::Sequence and Bio::Sequence::NA In-Reply-To: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> References: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> Message-ID: Hi Anthony, Bio::Sequence is a generic container class for a sequence with features which was introduced relatively recently for interconversion of Bio::GenBank, Bio::EMBL and Bio::SQL sequence objects (and it also provides common APIs for those seq objects). Bio::Sequence#na or #auto method extract a sequence from a Bio::Sequence object, so you should use another variable to store (instead of overriding the object reference). > biosequence = biosequence.auto seq = biosequence.auto so that you can still access to biosequence.features. % bioruby bioruby> embl = Bio::EMBL.new(open("http://togows.dbcls.jp/entry/embl/J00231").read) bioruby> embl.class ==> Bio::EMBL bioruby> embl_bioseq = embl.to_biosequence bioruby> embl_bioseq.class ==> Bio::Sequence bioruby> embl_seq = embl_bioseq.auto bioruby> embl_seq.class ==> Bio::Sequence::NA Toshiaki On 2011/05/16, at 18:07, Anthony Underwood wrote: > Here's something that has alway puzzled me about BioRuby > > If I start with a Bio::EMBL object and want to extract the features I can do the following > > biosequence = embl_object.to_biosequence > > This returns an instance of a Bio::Sequence class. I can now access the features > > features = biosequence.features > > However if the sequence is nucleotide and I want to translate it I have to do the following > > biosequence.na > > OR > > biosequence = biosequence.auto > > > This returns a Bio::Sequence::NA instance and I can now translate > > protein = biosequence.translate(1,11) > > > Why can I not now get at the features > > biosequence.features #=> undefined method `features' for # > > > I would have though that after converting to Bio::Sequence::NA or Bio::Sequence:AA the methods available to Bio::Sequence should still be available. > > Can anyone tell me what's going on here. Is there another method I should use? > > > Thanks > > Anthony > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From email2ants at gmail.com Tue May 17 16:06:37 2011 From: email2ants at gmail.com (Anthony Underwood) Date: Tue, 17 May 2011 21:06:37 +0100 Subject: [BioRuby] Bio::Sequence and Bio::Sequence::NA In-Reply-To: References: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> Message-ID: Dear Toshiaki Thank you for your reply. I have just tested your code below and all worked OK. I have found unexpectedly that embl_bioseq.translate works even though embl_bioseq.methods does not list translate as an available method. What extra methods does a Bio::Sequence::NA object instantiated using the auto method give? Thanks again for your advice, Anthony On 17 May 2011 02:33, Toshiaki Katayama wrote: > Hi Anthony, > > Bio::Sequence is a generic container class for a sequence with features > which was > introduced relatively recently for interconversion of Bio::GenBank, > Bio::EMBL and > Bio::SQL sequence objects (and it also provides common APIs for those seq > objects). > > Bio::Sequence#na or #auto method extract a sequence from a Bio::Sequence > object, so > you should use another variable to store (instead of overriding the object > reference). > > > biosequence = biosequence.auto > > seq = biosequence.auto > > so that you can still access to biosequence.features. > > % bioruby > bioruby> embl = Bio::EMBL.new(open(" > http://togows.dbcls.jp/entry/embl/J00231").read) > bioruby> embl.class > ==> Bio::EMBL > bioruby> embl_bioseq = embl.to_biosequence > bioruby> embl_bioseq.class > ==> Bio::Sequence > bioruby> embl_seq = embl_bioseq.auto > bioruby> embl_seq.class > ==> Bio::Sequence::NA > > Toshiaki > > > On 2011/05/16, at 18:07, Anthony Underwood wrote: > > > Here's something that has alway puzzled me about BioRuby > > > > If I start with a Bio::EMBL object and want to extract the features I can > do the following > > > > biosequence = embl_object.to_biosequence > > > > This returns an instance of a Bio::Sequence class. I can now access the > features > > > > features = biosequence.features > > > > However if the sequence is nucleotide and I want to translate it I have > to do the following > > > > biosequence.na > > > > OR > > > > biosequence = biosequence.auto > > > > > > This returns a Bio::Sequence::NA instance and I can now translate > > > > protein = biosequence.translate(1,11) > > > > > > Why can I not now get at the features > > > > biosequence.features #=> undefined method `features' for > # > > > > > > I would have though that after converting to Bio::Sequence::NA or > Bio::Sequence:AA the methods available to Bio::Sequence should still be > available. > > > > Can anyone tell me what's going on here. Is there another method I should > use? > > > > > > Thanks > > > > Anthony > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > From ktym at hgc.jp Tue May 17 16:53:18 2011 From: ktym at hgc.jp (Toshiaki Katayama) Date: Wed, 18 May 2011 05:53:18 +0900 Subject: [BioRuby] Bio::Sequence and Bio::Sequence::NA In-Reply-To: References: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> Message-ID: <0992ED00-7DF5-4BCE-926B-8E615E77DB36@hgc.jp> Hi Anthony, In lib/bio/sequence.rb, you can find the definition of "method_missing". # Pass any unknown method calls to the wrapped sequence object. see # http://www.rubycentral.com/book/ref_c_object.html#Object.method_missing def method_missing(sym, *args, &block) #:nodoc: begin seq.__send__(sym, *args, &block) : # The sequence object, usually Bio::Sequence::NA/AA, # but could be a simple String attr_accessor :seq This means any methods which are not understood by the Bio::Sequence object are simply redirected to the internal sequence object. Therefore, if the sequence object is a Bio::Sequence::NA instance, it will respond to any methods implemented in the Bio::Sequence::NA class (and mix-ins). Older versions were much simpler but the internal code gets complicated over time to improve usability and functionality. Hopefully, it should be clearly documented. Cheers, Toshiaki On 2011/05/18, at 5:06, Anthony Underwood wrote: > Dear Toshiaki > > Thank you for your reply. > > I have just tested your code below and all worked OK. I have found unexpectedly that embl_bioseq.translate works even though embl_bioseq.methods does not list translate as an available method. > What extra methods does a Bio::Sequence::NA object instantiated using the auto method give? > > Thanks again for your advice, Anthony > > On 17 May 2011 02:33, Toshiaki Katayama wrote: > Hi Anthony, > > Bio::Sequence is a generic container class for a sequence with features which was > introduced relatively recently for interconversion of Bio::GenBank, Bio::EMBL and > Bio::SQL sequence objects (and it also provides common APIs for those seq objects). > > Bio::Sequence#na or #auto method extract a sequence from a Bio::Sequence object, so > you should use another variable to store (instead of overriding the object reference). > > > biosequence = biosequence.auto > > seq = biosequence.auto > > so that you can still access to biosequence.features. > > % bioruby > bioruby> embl = Bio::EMBL.new(open("http://togows.dbcls.jp/entry/embl/J00231").read) > bioruby> embl.class > ==> Bio::EMBL > bioruby> embl_bioseq = embl.to_biosequence > bioruby> embl_bioseq.class > ==> Bio::Sequence > bioruby> embl_seq = embl_bioseq.auto > bioruby> embl_seq.class > ==> Bio::Sequence::NA > > Toshiaki > > > On 2011/05/16, at 18:07, Anthony Underwood wrote: > > > Here's something that has alway puzzled me about BioRuby > > > > If I start with a Bio::EMBL object and want to extract the features I can do the following > > > > biosequence = embl_object.to_biosequence > > > > This returns an instance of a Bio::Sequence class. I can now access the features > > > > features = biosequence.features > > > > However if the sequence is nucleotide and I want to translate it I have to do the following > > > > biosequence.na > > > > OR > > > > biosequence = biosequence.auto > > > > > > This returns a Bio::Sequence::NA instance and I can now translate > > > > protein = biosequence.translate(1,11) > > > > > > Why can I not now get at the features > > > > biosequence.features #=> undefined method `features' for # > > > > > > I would have though that after converting to Bio::Sequence::NA or Bio::Sequence:AA the methods available to Bio::Sequence should still be available. > > > > Can anyone tell me what's going on here. Is there another method I should use? > > > > > > Thanks > > > > Anthony > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > From daijiendoh at gmail.com Wed May 18 21:04:24 2011 From: daijiendoh at gmail.com (=?ISO-2022-JP?B?GyRCMXNGI0JnRnMbKEI=?=) Date: Thu, 19 May 2011 10:04:24 +0900 Subject: [BioRuby] Disk cash on the parse genes Message-ID: Dear All I often download whole genbank data from bio at mirror ( such as gbbct12.seq ) and parse them. But recently, parsing the whole data became to be difficult. On some some step, the program need a long time to select nucleic acid sequences of genes or transcripts. It seems that selection of spliced or partial sequences from a long (genome) nucleic acid sequence using feature data. Anyone have strategies or methods avoiding these heavy steps ? Daiji Endoh Rakuno Gakuen University From R.A.Vos at reading.ac.uk Thu May 19 07:22:43 2011 From: R.A.Vos at reading.ac.uk (Rutger Vos) Date: Thu, 19 May 2011 12:22:43 +0100 Subject: [BioRuby] 10 days left to register: workshop phylogenetic pipelines, August 1-11 Message-ID: COMPUTATIONAL PHYLOINFORMATICS August 1 2011 through August 11 2011 Bioinformatics Center of Kyoto University Application Deadline: May 31, 2011 http://academy.nescent.org/wiki/Computational_phyloinformatics Computational Phyloinformatics is an 11-day international course (August 1-11, 2011) co-organized by the Computational Biology Research Center (CBRC/AIST), the Bioinformatics Center of Kyoto University, the Database Center for Life Science (DBCLS/JST), and the National Evolutionary Synthesis Center (NESCent). This course, which will take place at Kyoto University directly following the SMBE Meeting (http://smbe2011.com/), aims to give participants practical knowledge and hands-on skills in phyloinformatics. The venue in Kyoto is completely unaffected by the unfortunate events in Fukushima and the power shortages in Tokyo. We encourage biologists from other countries to participate in the SMBE meeting and/or this special international course, in solidarity with the scientific community of Japan in their effort to return to normalcy and to help minimize any negative impacts that the earthquake may have on scientific activities in Japan. SYNOPSIS Biologists are faced with ever-larger datasets, more complex evolutionary models, and increasingly elaborate analytical methods. Seldom is it sufficient to run a dataset with an off-the-shelf program on a desktop PC; increasingly, biologists need to write scripts to interface with internet services and databases, build analytical pipelines, customize analyses, and distribute computation over multiple processors. This course is designed for graduate students, postdocs, faculty, and researchers in phylogenetics interested in receiving practical, hands-on training in the use of Perl and SQL for workflows and applications in phyloinformatics. The course is divided into four parts: PART I: A tutorial review of Perl, including object oriented programming and building packages. PART II: Introduction and practical use of BioPerl and Bio::Phylo, (e.g. scripting for large tree inference engines, automating model testing, genomic-scale data mining and acquisition, supertree assembly, rate smoothing and branch calibration, tree traversal, etc). PART III: Introduction and practical use of BioRuby for molecular evolution and functional genomics (e.g. scripting multiple sequence alignment, gene duplication inference, tree inference, etc.). PART IV: Introduction to SQL and database design; computing and querying nested sets and transitive closure; querying both large trees (e.g. NCBI) and large collections of trees (e.g. TreeBASE). Participants will learn how to write basic phylogenetic or comparative analysis scripts, parse NEXUS files, traverse and compute over trees, and make practical use of phylogenetic software libraries. These skills will be learned in a biological context, touching on a diverse array of topics such as analysis of large datasets, automation of supertree assembly, querying for topological patterns in large collections of trees, etc. Participants will leave the course with a full set of installations and libraries on their computer ready to build phyloinformatic workflows for their own research projects, as well as continued access to a 50+ page wiki "textbook" containing step-by-step instructions, problem sets, and examples. INSTRUCTORS AND COURSE ORGANIZERS Christian Zmasek, Karen Cranston, Rutger A. Vos, Susumu Goto, Toshiaki Katayama, William H. Piel APPLICATION DEADLINE May 31, 2011 TUITION ?40,000 (~$500) Participants are responsible for their own travel costs, including transportation and accommodation -- see the website for more information. International participants will benefit by combining attendance with the 2011 SMBE meeting. A limited number of travel scholarships from NESCent are available for US-based students. Preference will be given to students from under-represented minorities. SUBSIDIES AND SCHOLARSHIPS A limited number of travel scholarships from NESCent are available for US-based students. Preference will be given to students from under-represented minorities. The Asia-Pacific Bioinformatics Network (APBioNet) is happy to provide travel assistance for a limited number of students/early career researchers from the Asia-Pacific region. Applicants are requested to contact Dr Asif Khan, APBioNet Secretariat: asif -$- bic.nus.edu.sg (replace -$- with @) for details. PREREQUISITES BIOLOGY: A good understanding of phylogenetics ? for example, having already taken the Workshop on Molecular Evolution (http://www.molecularevolution.org/) or equivalent coursework or experience. COMPUTING: Prior experience with Perl or careful study of the suggested reading materials in advance of the class (see web site). Participants should have some experience with basic Unix shell commands. EQUIPMENT: Participants are expected to bring their own Mac OSX computer or a LINUX computer, else they will be provided with an iMac. Participants who cannot bring their own computer and will be using a supplied iMac, should consider bringing their own portable firewire/usb drive so that they can also leave the course with a full suite of phyloinformatic software tools. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com From ktym at hgc.jp Fri May 20 02:31:13 2011 From: ktym at hgc.jp (Toshiaki Katayama) Date: Fri, 20 May 2011 15:31:13 +0900 Subject: [BioRuby] Disk cash on the parse genes In-Reply-To: References: Message-ID: <4DD023E9-9FBE-4A38-AE9C-A19FD4751CDF@hgc.jp> Dear Endoh-san, Thank you for pointing this problem out. I tried to parse gbbct12.seq file with the example code based on our tutorial at http://bioruby.open-bio.org/wiki/Tutorial and found that the actual problem is in the multiple calling of the gb.naseq method. The method is defined as shown in below and which doesn't cache the generated Bio::Sequence::NA object, therefore, it will take long time if called multiple times, especially for a long sequence. bio/db/genbank/genbank.rb: def seq unless @data['SEQUENCE'] origin end Bio::Sequence::NA.new(@data['SEQUENCE']) end alias naseq seq If I store the object outside of the loop of feature manipulation, it became much faster. % ruby gbparse.rb gbbct12.seq > gbbct12.out 2> gbbct12.err Parsed 16125 entries in 1645.838824 sec. % ruby gbparse_new.rb gbbct12.seq > gbbct12.out_new 2> gbbct12.err_new Parsed 16125 entries in 39.012607 sec. Based on this observation, could you check the algorithm of your code? Regards, Toshiaki Katayama -------------- next part -------------- On 2011/05/19, at 10:04, ???? wrote: > Dear All > > I often download whole genbank data from bio at mirror ( such as > gbbct12.seq ) and parse them. > But recently, parsing the whole data became to be difficult. On some > some step, the program need a long time to select nucleic acid > sequences of genes or transcripts. It seems that selection of spliced > or partial sequences from a long (genome) nucleic acid sequence using > feature data. > > Anyone have strategies or methods avoiding these heavy steps ? > > Daiji Endoh > Rakuno Gakuen University > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From daijiendoh at gmail.com Sat May 21 01:33:58 2011 From: daijiendoh at gmail.com (=?ISO-2022-JP?B?GyRCMXNGI0JnRnMbKEI=?=) Date: Sat, 21 May 2011 14:33:58 +0900 Subject: [BioRuby] Fwd: Disk cash on the parse genes In-Reply-To: <4DD023E9-9FBE-4A38-AE9C-A19FD4751CDF@hgc.jp> References: <4DD023E9-9FBE-4A38-AE9C-A19FD4751CDF@hgc.jp> Message-ID: Dear Katayama-san I am very very grateful to your suggestion. I have been struggled on this problem for 6 months. Using your code, I can overcome the problem. But, only one point the code stopped. If the feature.position refer to the other entry such as "join(M52614.1:1..1456,5216..5823), the code returned a error. So I added a line below. next if position =~ /[A-Z]+\d+\W*\d*\:/ The inserting code now working. I attached the modified code. Thanks again, Daiji Endoh ************************************************************************ Dear Endoh-san, Thank you for pointing this problem out. I tried to parse gbbct12.seq file with the example code based on our tutorial at http://bioruby.open-bio.org/wiki/Tutorial and found that the actual problem is in the multiple calling of the gb.naseq method. The method is defined as shown in below and which doesn't cache the generated Bio::Sequence::NA object, therefore, it will take long time if called multiple times, especially for a long sequence. bio/db/genbank/genbank.rb: def seq unless @data['SEQUENCE'] origin end Bio::Sequence::NA.new(@data['SEQUENCE']) end alias naseq seq If I store the object outside of the loop of feature manipulation, it became much faster. % ruby gbparse.rb gbbct12.seq > gbbct12.out 2> gbbct12.err Parsed 16125 entries in 1645.838824 sec. % ruby gbparse_new.rb gbbct12.seq > gbbct12.out_new 2> gbbct12.err_new Parsed 16125 entries in 39.012607 sec. Based on this observation, could you check the algorithm of your code? Regards, Toshiaki Katayama **************************************************************************************** On 2011/05/19, at 10:04, ???? wrote: > Dear All > > I often download whole genbank data from bio at mirror ( such as > gbbct12.seq ) and parse them. > But recently, parsing the whole data became to be difficult. On some > some step, the program need a long time to select nucleic acid > sequences of genes or transcripts. It seems that selection of spliced > or partial sequences from a long (genome) nucleic acid sequence using > feature data. > > Anyone have strategies or methods avoiding these heavy steps ? > > Daiji Endoh > Rakuno Gakuen University > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- ?????????????????? ???? ?069-8501????????????582 Tel: 011-388-4847 Fax:011-387-5890 From ktym at hgc.jp Sun May 22 23:28:08 2011 From: ktym at hgc.jp (Toshiaki Katayama) Date: Mon, 23 May 2011 12:28:08 +0900 Subject: [BioRuby] Fwd: Disk cash on the parse genes In-Reply-To: References: <4DD023E9-9FBE-4A38-AE9C-A19FD4751CDF@hgc.jp> Message-ID: Dear Endoh-san, > Using your code, I can overcome the problem. Good! > "join(M52614.1:1..1456,5216..5823), the code returned a error. External reference should be detected by the Bio::Location class and the ID will be stored in an instance variable @xref_id, however, how to deal with it is up to users, so you need to implement some code to fetch external entry (in this case @xref_id="M52614.1") from available services (local DB or web service etc.) and extract the sub-sequence from the entry. Please take a look at pattern (G) in the documentation. http://bioruby.open-bio.org/rdoc/classes/Bio/Locations.html Unfortunately, I've got an unexplained "Exception" error from NCBI when retrieving http://www.ncbi.nlm.nih.gov/nuccore?term=M52614.1 so, I'll use "join(U75473.1:1..293,1..216)" found in a GenBank entry SMMFD02 (gbbct65.seq) for example. # obtain a genbank record bioruby> entry = getobj("genbank:SMMFD02") or bioruby> entry = Bio::GenBank.new(open("http://togows.dbcls.jp/entry/ncbi-genbank/SMMFD02").read) # cache whole sequence as we learnt in this thread :-) bioruby> naseq = entry.naseq # pick up "gene" features only bioruby> genes = entry.features.select {|x| x.feature == "gene" } ==> [#]>] # example to handle external references in a given position bioruby> genes.each do |gene| locations = Bio::Locations.new(gene.position) locations.each do |location| if xref = location.xref_id xref_entry = open("http://togows.dbcls.jp/entry/ncbi-genbank/#{xref}").read location.sequence = Bio::GenBank.new(xref_entry).naseq.subseq(location.from, location.to) end end gene.position = locations.to_s # (*1) puts naseq.splice(gene.position) # (*2) end (*1) will generate the following string join(replace(U75473.1:1..293,"gtcttcttgttggtgatgttggttttggaaaaacggaagtagcgatgagagctgcttttaaagcagttaatgatgataaacaagttgctgttttggtgccaacaacagttcttgctcaacagcactataatacttttaaggagcgctttgaaaattttcctgtcaatgttgccatgatgagtcgttttaaaaccaagactgaacagtctgaaacgttaactaaattagctaagggacaggttgatatcattattggaacacatcgtctactttctaaagatgttacgtttaaa"),1..216) (*2) will return 293 + 216 = 509 bp sequence gtcttcttgttggtgatgttggttttggaaaaacggaagtagcgatgagagctgcttttaaagcagttaatgatgataaacaagttgctgttttggtgccaacaacagttcttgctcaacagcactataatacttttaaggagcgctttgaaaattttcctgtcaatgttgccatgatgagtcgttttaaaaccaagactgaacagtctgaaacgttaactaaattagctaagggacaggttgatatcattattggaacacatcgtctactttctaaagatgttacgtttaaaggggttaaacacaaggaaacattgaaagaattaaaaactaaggttgatgtcttgaccttgacagcaactcctattccacggacattacatatgtctatgcttggtatacgagatttatcagttattgaaacacctccaagtaatcgttaccctgtccagacttatgttatggaaacaaatgcaagtgtcattcgtgaagctattatgcgtgaaatt During this trial, I found a bug in the Bio::Sequence#splice method. bio/sequence/common.rb: def splice(position) unless position.is_a?(Locations) then position = Locations.new(position) end s = '' position.each do |location| if location.sequence s << location.sequence else # <----- (*3) exon = self.subseq(location.from, location.to) begin exon.complement! if location.strand < 0 rescue NameError end s << exon end end return self.class.new(s) end alias splicing splice We need to fix this else block (*3) to mind if @xref_id exists or not. Currently, "join(U75473.1:1..293,1..216)" will be treated as "join(1..293,1..216)" and, obviously, it is not feasible. Toshiaki On 2011/05/21, at 14:33, ???? wrote: > Dear Katayama-san > > I am very very grateful to your suggestion. I have been struggled on > this problem for 6 months. > Using your code, I can overcome the problem. > > But, only one point the code stopped. > If the feature.position refer to the other entry such as > "join(M52614.1:1..1456,5216..5823), the code returned a error. > So I added a line below. > > next if position =~ /[A-Z]+\d+\W*\d*\:/ > > The inserting code now working. > I attached the modified code. > Thanks again, > > Daiji Endoh > ************************************************************************ > Dear Endoh-san, > > Thank you for pointing this problem out. > > I tried to parse gbbct12.seq file with the example code based on > our tutorial at http://bioruby.open-bio.org/wiki/Tutorial and found > that the actual problem is in the multiple calling of the gb.naseq method. > > The method is defined as shown in below and which doesn't cache > the generated Bio::Sequence::NA object, therefore, it will take > long time if called multiple times, especially for a long sequence. > > bio/db/genbank/genbank.rb: > def seq > unless @data['SEQUENCE'] > origin > end > Bio::Sequence::NA.new(@data['SEQUENCE']) > end > alias naseq seq > > If I store the object outside of the loop of feature manipulation, > it became much faster. > > % ruby gbparse.rb gbbct12.seq > gbbct12.out 2> gbbct12.err > Parsed 16125 entries in 1645.838824 sec. > > % ruby gbparse_new.rb gbbct12.seq > gbbct12.out_new 2> gbbct12.err_new > Parsed 16125 entries in 39.012607 sec. > > Based on this observation, could you check the algorithm of your code? > > Regards, > Toshiaki Katayama > **************************************************************************************** > > > On 2011/05/19, at 10:04, ???? wrote: > >> Dear All >> >> I often download whole genbank data from bio at mirror ( such as >> gbbct12.seq ) and parse them. >> But recently, parsing the whole data became to be difficult. On some >> some step, the program need a long time to select nucleic acid >> sequences of genes or transcripts. It seems that selection of spliced >> or partial sequences from a long (genome) nucleic acid sequence using >> feature data. >> >> Anyone have strategies or methods avoiding these heavy steps ? >> >> Daiji Endoh >> Rakuno Gakuen University >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > > > > > -- > ?????????????????? > ???? > ?069-8501????????????582 > Tel: 011-388-4847 > Fax:011-387-5890 > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From mictadlo at gmail.com Wed May 25 17:10:26 2011 From: mictadlo at gmail.com (Michal) Date: Thu, 26 May 2011 07:10:26 +1000 Subject: [BioRuby] BioRuby with Reia Message-ID: <4DDD7042.9050708@gmail.com> Hello, would be possible to run BioRuby with Reia ( http://en.wikipedia.org/wiki/Reia_(programming_language) )? From bonnal at ingm.org Thu May 26 04:31:33 2011 From: bonnal at ingm.org (Raoul Bonnal) Date: Thu, 26 May 2011 10:31:33 +0200 Subject: [BioRuby] BioRuby with Reia In-Reply-To: <4DDD7042.9050708@gmail.com> References: <4DDD7042.9050708@gmail.com> Message-ID: <10EFE5AC-BBA0-4113-9878-FD92C9ECF326@ingm.org> On 25/mag/2011, at 23.10, Michal wrote: > Hello, > would be possible to run BioRuby with Reia ( http://en.wikipedia.org/wiki/Reia_(programming_language) )? Don't know, which is the advantage ? -- Ra From pjotr.public14 at thebird.nl Thu May 26 06:01:50 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 26 May 2011 12:01:50 +0200 Subject: [BioRuby] BioRuby with Reia In-Reply-To: <10EFE5AC-BBA0-4113-9878-FD92C9ECF326@ingm.org> References: <4DDD7042.9050708@gmail.com> <10EFE5AC-BBA0-4113-9878-FD92C9ECF326@ingm.org> Message-ID: <20110526100149.GA19867@thebird.nl> Reia is not Ruby compatible. Even if it has some similar Syntax. So, BioRuby won't run on Reia. Pj. On Thu, May 26, 2011 at 10:31:33AM +0200, Raoul Bonnal wrote: > > On 25/mag/2011, at 23.10, Michal wrote: > > > Hello, > > would be possible to run BioRuby with Reia ( http://en.wikipedia.org/wiki/Reia_(programming_language) )? > Don't know, which is the advantage ? > > -- > Ra > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From mictadlo at gmail.com Sat May 28 19:47:49 2011 From: mictadlo at gmail.com (Michal) Date: Sun, 29 May 2011 09:47:49 +1000 Subject: [BioRuby] samtools-ruby In-Reply-To: <74EAEDFF-D8D0-42E0-93B8-51C4986CEC65@ingm.it> References: <4D454E91.1080604@gmail.com> <4D4A6459.5050205@gmail.com> <5B31A257-DDCB-4BBB-A201-B4D708E82BE0@kenroku.kanazawa-u.ac.jp> <4D4BB255.7030703@gmail.com> <4D4BE242.2030408@gmail.com> <4D4BFB7D.9070004@gmail.com> <74EAEDFF-D8D0-42E0-93B8-51C4986CEC65@ingm.it> Message-ID: <4DE189A5.2010201@gmail.com> Hello, how is it possible to get the following pileup output with bioruby-samtools? coverage at base 99 = 1 base in read EAS56_57:6:190:289:82 = A coverage at base 100 = 1 base in read EAS56_57:6:190:289:82 = G coverage at base 101 = 1 base in read EAS56_57:6:190:289:82 = G coverage at base 102 = 2 base in read EAS56_57:6:190:289:82 = G base in read EAS51_64:3:190:727:308 = G I have found only a this python code : import pysam samfile = pysam.Samfile("ex1.bam", "rb" ) for pileupcolumn in samfile.pileup( 'chr1', 100, 120): print print 'coverage at base %s = %s' % (pileupcolumn.pos , pileupcolumn.n) for pileupread in pileupcolumn.pileups: print '\tbase in read %s = %s' % (pileupread.alignment.qname, pileupread.alignment.seq[pileupread.qpos]) samfile.close() But do not know how to do it with Ruby. Thank you in advance. On 02/04/2011 11:57 PM, Raoul Bonnal wrote: > In these days, w.e. too, I have no time for sam tools. From the next > week I could spend more time on this project and improve test, > usability and platform supports. > > > > On 04/feb/2011, at 14.13, Michal wrote: > >> Hi, >> I would be happy if would find out how to get on a particular >> position the alignment and then I could give feedback. >> >> Pysam http://code.google.com/p/pysam/ contains all files and tests. >> ~/Downloads/pysam-0.3.1/tests$ ls >> 00README.txt ex4.sam ex8.sam Makefile >> ex1.fa ex5.sam example.gtf.gz pysam_test.py >> ex1.sam.gz ex6.sam example.gtf.gz.tbi segfault_tests.py >> ex3.sam ex7.sam example.py tabix_test.py >> >> Maybe it would be possible to test bioruby-samtools in the same way. >> Pysam is ship out with samtools source code and maybe could be used >> it for bioruby-samtools. >> >> Thank you in advance. >> >> Michal >> >> >> On 02/04/2011 10:26 PM, Tomoaki NISHIYAMA wrote: >>> Hi, >>> >>>> What I have forgotten to do? >>> >>> Now, you are at the point I reached yesterday and >>> I don't think you have forgotten anything. >>> >>> From yesterday's mail: >>>> 1) Failure: >>>> test: BioSamtools should probably rename this file and start >>>> testing for real. (TestBioSamtools) [test/test_bio-samtools.rb:5]: >>>> hey buddy, you should probably rename this file and start testing >>>> for real >>>> >>>> Loading seems ok. >>>> I'm not sure if this is bad or ok. >>> >>> You could look at test/test_bio-samtools.rb >>> $ cat test/test_bio-samtools.rb >>> require 'helper' >>> >>> class TestBioSamtools < Test::Unit::TestCase >>> should "probably rename this file and start testing for real" do >>> flunk "hey buddy, you should probably rename this file and start >>> testing for real" >>> end >>> end >>> >>> and guess what it means. >>> >>> My guess is that this is test not implemented yet. >>> So, this error does not tell if the library function well or can not >>> used at all. >>> You might just try what you wanted to do and see if it works. >>> >>>> I understand how difficult it is to keep track and it is a good >>>> idea to ship bioruby-samtools >>>> with a working samtools version like Raoul does it. >>> >>> My view is the opposite. >>> Since it potentially has many bugs and changes rapidly, bundled >>> shipping is ineffective. >>> With the lack of test code, we cannot even tell which is a good >>> working version. >>> -- >>> Tomoaki NISHIYAMA >>> >>> Advanced Science Research Center, >>> Kanazawa University, >>> 13-1 Takara-machi, >>> Kanazawa, 920-0934, Japan >>> >> > > -- > R.J.P.B. > > > From p.j.a.cock at googlemail.com Tue May 3 09:24:08 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 May 2011 10:24:08 +0100 Subject: [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: References: Message-ID: Hello all, I've CC'd the BioPerl, BioRuby, BioJava and Biopython development mailing lists to make sure you're aware of this, but can we continue any discussion on the cross-project open-bio-l mailing list please? I noticed that recent versions of BLAST are not using a single block for each query, which was the historical behaviour and assumed by the Biopython BLAST XML parser. This may be a bug in BLAST. See link below for an example. Has anyone else noticed this, and has it been reported to the NCBI yet? Thanks, Peter (Not for the first time, I wish there was a public bug tracker for BLAST, or at least a private bug tracker so we could talk about issues with an NCBI assigned reference number.) ---------- Forwarded message ---------- From: Peter Cock Date: Wed, Apr 20, 2011 at 6:08 PM Subject: Interesting BLAST 2.2.25+ XML behaviour To: Biopython-Dev Mailing List Hi all, Have a look at this XML file from a FASTA vs FASTA search using blastp from ?BLAST 2.2.25+ (current release), which is a test file I created for the BLAST+ wrappers in Galaxy: https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml I just put it though the Biopython BLAST XML parser, and was surprised not to get four records back (since as you might guess from the filename, there were four queries). It appears this version of BLAST+ is incrementing the iteration counter for each match... or something like that. Has anyone else noticed this? I wonder if it is accidental... Peter From cjfields at illinois.edu Tue May 3 13:31:55 2011 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 3 May 2011 08:31:55 -0500 Subject: [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: References: Message-ID: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> Haven't tried this using the latest BLAST+ myself, but it doesn't surprise me too much. Also agree re: some kind of bug tracking with NCBI; I believe they have an internal one, but it would be nice to have a public interface to it. chris On May 3, 2011, at 4:24 AM, Peter Cock wrote: > Hello all, > > I've CC'd the BioPerl, BioRuby, BioJava and Biopython development mailing > lists to make sure you're aware of this, but can we continue any discussion > on the cross-project open-bio-l mailing list please? > > I noticed that recent versions of BLAST are not using a single > block for each query, which was the historical behaviour and assumed > by the Biopython BLAST XML parser. This may be a bug in BLAST. > See link below for an example. > > Has anyone else noticed this, and has it been reported to the NCBI yet? > > Thanks, > > Peter > > (Not for the first time, I wish there was a public bug tracker for BLAST, > or at least a private bug tracker so we could talk about issues with an > NCBI assigned reference number.) > > ---------- Forwarded message ---------- > From: Peter Cock > Date: Wed, Apr 20, 2011 at 6:08 PM > Subject: Interesting BLAST 2.2.25+ XML behaviour > To: Biopython-Dev Mailing List > > > Hi all, > > Have a look at this XML file from a FASTA vs FASTA search > using blastp from BLAST 2.2.25+ (current release), which > is a test file I created for the BLAST+ wrappers in Galaxy: > > https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml > > I just put it though the Biopython BLAST XML parser, and > was surprised not to get four records back (since as you > might guess from the filename, there were four queries). > > It appears this version of BLAST+ is incrementing the > iteration counter for each match... or something like that. > > Has anyone else noticed this? I wonder if it is accidental... > > Peter > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Wed May 4 11:51:51 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 4 May 2011 13:51:51 +0200 Subject: [BioRuby] Interesting BLAST 2.2.25+ XML behaviour In-Reply-To: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> References: <398303E2-1195-4CC2-8B73-09C6C1117892@illinois.edu> Message-ID: <20110504115151.GA12003@thebird.nl> Something also rather odd. Can you imagine that a basic BLAST lesson would make it into PLoS Biology? PLoS Biology states: the journal features works of exceptional significance, originality, and relevance in all areas of biological science, from molecules to ecosystems, including works at the interface of other disciplines, such as chemistry, medicine, and mathematics. Well, so much for that. Check out a copy of the BLAST help page, as published in PLoS Biology: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3032543/ I checked the date. It is not April 1st. It ticks, however, the relevance box. But I am not sure about its significance. It is certainly not original. Where are we heading? Can we start submitting man pages now? Pj. On Tue, May 03, 2011 at 08:31:55AM -0500, Chris Fields wrote: > Haven't tried this using the latest BLAST+ myself, but it doesn't surprise me too much. Also agree re: some kind of bug tracking with NCBI; I believe they have an internal one, but it would be nice to have a public interface to it. > > chris > > On May 3, 2011, at 4:24 AM, Peter Cock wrote: > > > Hello all, > > > > I've CC'd the BioPerl, BioRuby, BioJava and Biopython development mailing > > lists to make sure you're aware of this, but can we continue any discussion > > on the cross-project open-bio-l mailing list please? > > > > I noticed that recent versions of BLAST are not using a single > > block for each query, which was the historical behaviour and assumed > > by the Biopython BLAST XML parser. This may be a bug in BLAST. > > See link below for an example. > > > > Has anyone else noticed this, and has it been reported to the NCBI yet? > > > > Thanks, > > > > Peter > > > > (Not for the first time, I wish there was a public bug tracker for BLAST, > > or at least a private bug tracker so we could talk about issues with an > > NCBI assigned reference number.) > > > > ---------- Forwarded message ---------- > > From: Peter Cock > > Date: Wed, Apr 20, 2011 at 6:08 PM > > Subject: Interesting BLAST 2.2.25+ XML behaviour > > To: Biopython-Dev Mailing List > > > > > > Hi all, > > > > Have a look at this XML file from a FASTA vs FASTA search > > using blastp from BLAST 2.2.25+ (current release), which > > is a test file I created for the BLAST+ wrappers in Galaxy: > > > > https://bitbucket.org/galaxy/galaxy-central/src/8eaf07a46623/test-data/blastp_four_human_vs_rhodopsin.xml > > > > I just put it though the Biopython BLAST XML parser, and > > was surprised not to get four records back (since as you > > might guess from the filename, there were four queries). > > > > It appears this version of BLAST+ is incrementing the > > iteration counter for each match... or something like that. > > > > Has anyone else noticed this? I wonder if it is accidental... > > > > Peter > > > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From rutgeraldo at gmail.com Tue May 10 13:43:22 2011 From: rutgeraldo at gmail.com (Rutger Vos) Date: Tue, 10 May 2011 14:43:22 +0100 Subject: [BioRuby] Announcement: Registration open for Computational Phyloinformatics course Message-ID: COMPUTATIONAL PHYLOINFORMATICS August 1 2011 through August 11 2011 Bioinformatics Center of Kyoto University Application Deadline: May 31, 2011 http://academy.nescent.org/wiki/Computational_phyloinformatics Computational Phyloinformatics is an 11-day international course (August 1-11, 2011) co-organized by the Computational Biology Research Center (CBRC/AIST), the Bioinformatics Center of Kyoto University, the Database Center for Life Science (DBCLS/JST), and the National Evolutionary Synthesis Center (NESCent). This course, which will take place at Kyoto University directly following the SMBE Meeting (http://smbe2011.com/), aims to give participants practical knowledge and hands-on skills in phyloinformatics. The venue in Kyoto is completely unaffected by the unfortunate events in Fukushima and the power shortages in Tokyo. We encourage biologists from other countries to participate in the SMBE meeting and/or this special international course, in solidarity with the scientific community of Japan in their effort to return to normalcy and to help minimize any negative impacts that the earthquake may have on scientific activities in Japan. SYNOPSIS Biologists are faced with ever-larger datasets, more complex evolutionary models, and increasingly elaborate analytical methods. Seldom is it sufficient to run a dataset with an off-the-shelf program on a desktop PC; increasingly, biologists need to write scripts to interface with internet services and databases, build analytical pipelines, customize analyses, and distribute computation over multiple processors. This course is designed for graduate students, postdocs, faculty, and researchers in phylogenetics interested in receiving practical, hands-on training in the use of Perl and SQL for workflows and applications in phyloinformatics. The course is divided into four parts: PART I: A tutorial review of Perl, including object oriented programming and building packages. PART II: Introduction and practical use of BioPerl and Bio::Phylo, (e.g. scripting for large tree inference engines, automating model testing, genomic-scale data mining and acquisition, supertree assembly, rate smoothing and branch calibration, tree traversal, etc). PART III: Introduction and practical use of BioRuby for molecular evolution and functional genomics (e.g. scripting multiple sequence alignment, gene duplication inference, tree inference, etc.). PART IV: Introduction to SQL and database design; computing and querying nested sets and transitive closure; querying both large trees (e.g. NCBI) and large collections of trees (e.g. TreeBASE). Participants will learn how to write basic phylogenetic or comparative analysis scripts, parse NEXUS files, traverse and compute over trees, and make practical use of phylogenetic software libraries. These skills will be learned in a biological context, touching on a diverse array of topics such as analysis of large datasets, automation of supertree assembly, querying for topological patterns in large collections of trees, etc. Participants will leave the course with a full set of installations and libraries on their computer ready to build phyloinformatic workflows for their own research projects, as well as continued access to a 50+ page wiki "textbook" containing step-by-step instructions, problem sets, and examples. INSTRUCTORS AND COURSE ORGANIZERS Christian Zmasek, Karen Cranston, Rutger A. Vos, Susumu Goto, Toshiaki Katayama, William H. Piel APPLICATION DEADLINE May 31, 2011 TUITION ?40,000 (~$500) Participants are responsible for their own travel costs, including transportation and accommodation -- see the website for more information. International participants will benefit by combining attendance with the 2011 SMBE meeting. A limited number of travel scholarships from NESCent are available for US-based students. Preference will be given to students from under-represented minorities. SUBSIDIES AND SCHOLARSHIPS A limited number of travel scholarships from NESCent are available for US-based students. Preference will be given to students from under-represented minorities. The Asia-Pacific Bioinformatics Network (APBioNet) is happy to provide travel assistance for a limited number of students/early career researchers from the Asia-Pacific region. Applicants are requested to contact Dr Asif Khan, APBioNet Secretariat: asif -$- bic.nus.edu.sg (replace -$- with @) for details. PREREQUISITES BIOLOGY: A good understanding of phylogenetics ? for example, having already taken the Workshop on Molecular Evolution (http://www.molecularevolution.org/) or equivalent coursework or experience. COMPUTING: Prior experience with Perl or careful study of the suggested reading materials in advance of the class (see web site). Participants should have some experience with basic Unix shell commands. EQUIPMENT: Participants are expected to bring their own Mac OSX computer or a LINUX computer, else they will be provided with an iMac. Participants who cannot bring their own computer and will be using a supplied iMac, should consider bringing their own portable firewire/usb drive so that they can also leave the course with a full suite of phyloinformatic software tools. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From bonnal at ingm.org Thu May 12 11:42:39 2011 From: bonnal at ingm.org (Raoul Bonnal) Date: Thu, 12 May 2011 13:42:39 +0200 Subject: [BioRuby] Update bio-ngs, bio-gem and GSoC Message-ID: <9738531A-A9E5-4BCD-BC4F-76EAC98BE86D@ingm.org> Dear All, this is an update on our activities biongs, biogem and GSoC As you already know, BioRuby has one project accepted: bio-objects http://bioruby.open-bio.org/wiki/Google_Summer_of_Code#Represent_bio-objects_and_related_information_with_images, Michal is the student assigned to this project. The other candidate student Ales wants to work on his proposal bioruby wrapper http://bioruby.open-bio.org/wiki/Google_Summer_of_Code#BioRuby_Wrapper_for_Command_line_application To optimize the time I think that we can fix a weekly meeting before or after the thursday BioRuby IRC meeting writing directly into bioruby's channel, obviously the students can write on ml and contact the mentors by irc,skype,ml etc... I think that this approach is useful also for other bioruby devs to be up to date and take part in the GSoC, because an idea/opinion is better than nothing, so your contribute is appreciated. bio-gem: * I added the possibility to create an embedded database when you create a gem, some code has been borrowed from Rails and adapted to our needs https://github.com/helios/bioruby-gem/tree/db_tasks I'll merge this branch (db_tasks) in master, next week Docs improved. * rails_engines is working but not yet in master bio-ngs & related subproject (bwa, samtools) * wrapper is more robust now * samtools now can be used directly from the binding https://github.com/helios/bioruby-samtools or from the wrapper. Why ? Because not all the functionalities have been bound, some time it's too complicated (see merge) So I wrote the wrapper https://github.com/helios/bioruby-ngs/blob/master/lib/bio/appl/ngs/samtools.rb ** samtools now is not shipped with precompiled library anymore, now it downloads and compiles the l library for the hist OS during the gem install ... I think that when possible we'll follow this approach. ** ricardo and dan agreed with us to have a common repository, working in that direction... * Francesco did a great job with bwa and implementing Homology and Ontology classes and tasks to work with homology searches (i.e. blast) and Gene Ontology datasets for annotations and functional analysis. All these classes work with a dedicated database to store and manipulate the data. Our abstract has been accepted for a BOSC talk, we are very happy because BioRuby will the @BOSC another time, it's a great chance to meet you and share thoughts with guys from other bio* projects. You can download the abstract here: http://dl.dropbox.com/u/16636340/bioruby-ngs_BOSC-2011-FS-TKTYM-13-04-11-Final.pdf They asked to talk about biogem and parallel processing as well :-) As above great chance to share ideas and experience. We'll be @ISMB as well with 2 posters (actually we'll have a poster @BOSC too) In these days I'm working on implementing a bio-gex class to handle gene expression datasets, it uses statsample (Claudio) to handle matrixes etc.., mostly coming from rnaseq ngs(priority) and from rtpcr, https://github.com/helios/bioruby-gex. It's an experiment but I think it could be useful, it's not a R replacement. The repo: https://github.com/helios/bioruby-gex TODO * write a lot of test on biongs * refactor * generalize some taks because we are developing them from our day by day work so it inevitably that we (mostly me, because Francesco is more precise than me :-) ) write biased code * .... Just a reminder for every one, if you are going to create a new gem for bioruby please try to follow bioruby's original namespace this will help core developers to integrate plugin's code into the main repository if needed. Cheers. -- The only change to succeed is starting from a simple thing. From philipp.comans at googlemail.com Fri May 13 15:50:06 2011 From: philipp.comans at googlemail.com (Philipp Comans) Date: Fri, 13 May 2011 17:50:06 +0200 Subject: [BioRuby] Discontiguous Megablast from BioRuby Message-ID: <46DBDF1DDBBB429D9372E3D815A15F72@googlemail.com> Hi everyone, I would like to perform a discontiguous megablast against a local blast database using Bio::Blast. As I understand, BioRuby 1.4.1 uses blastall from the legacy blast toolkit. This version does not support dc-megablast although I could be mistaken. Is it possible to perform a dc-megablast from BioRuby? Is there a way to call blastn from the new blast+ toolkit? Any help would be greatly appreciated! Best, Philipp From email2ants at gmail.com Mon May 16 09:07:47 2011 From: email2ants at gmail.com (Anthony Underwood) Date: Mon, 16 May 2011 10:07:47 +0100 Subject: [BioRuby] Bio::Sequence and Bio::Sequence::NA Message-ID: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> Here's something that has alway puzzled me about BioRuby If I start with a Bio::EMBL object and want to extract the features I can do the following biosequence = embl_object.to_biosequence This returns an instance of a Bio::Sequence class. I can now access the features features = biosequence.features However if the sequence is nucleotide and I want to translate it I have to do the following biosequence.na OR biosequence = biosequence.auto This returns a Bio::Sequence::NA instance and I can now translate protein = biosequence.translate(1,11) Why can I not now get at the features biosequence.features #=> undefined method `features' for # I would have though that after converting to Bio::Sequence::NA or Bio::Sequence:AA the methods available to Bio::Sequence should still be available. Can anyone tell me what's going on here. Is there another method I should use? Thanks Anthony From ktym at hgc.jp Tue May 17 01:33:41 2011 From: ktym at hgc.jp (Toshiaki Katayama) Date: Tue, 17 May 2011 10:33:41 +0900 Subject: [BioRuby] Bio::Sequence and Bio::Sequence::NA In-Reply-To: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> References: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> Message-ID: Hi Anthony, Bio::Sequence is a generic container class for a sequence with features which was introduced relatively recently for interconversion of Bio::GenBank, Bio::EMBL and Bio::SQL sequence objects (and it also provides common APIs for those seq objects). Bio::Sequence#na or #auto method extract a sequence from a Bio::Sequence object, so you should use another variable to store (instead of overriding the object reference). > biosequence = biosequence.auto seq = biosequence.auto so that you can still access to biosequence.features. % bioruby bioruby> embl = Bio::EMBL.new(open("http://togows.dbcls.jp/entry/embl/J00231").read) bioruby> embl.class ==> Bio::EMBL bioruby> embl_bioseq = embl.to_biosequence bioruby> embl_bioseq.class ==> Bio::Sequence bioruby> embl_seq = embl_bioseq.auto bioruby> embl_seq.class ==> Bio::Sequence::NA Toshiaki On 2011/05/16, at 18:07, Anthony Underwood wrote: > Here's something that has alway puzzled me about BioRuby > > If I start with a Bio::EMBL object and want to extract the features I can do the following > > biosequence = embl_object.to_biosequence > > This returns an instance of a Bio::Sequence class. I can now access the features > > features = biosequence.features > > However if the sequence is nucleotide and I want to translate it I have to do the following > > biosequence.na > > OR > > biosequence = biosequence.auto > > > This returns a Bio::Sequence::NA instance and I can now translate > > protein = biosequence.translate(1,11) > > > Why can I not now get at the features > > biosequence.features #=> undefined method `features' for # > > > I would have though that after converting to Bio::Sequence::NA or Bio::Sequence:AA the methods available to Bio::Sequence should still be available. > > Can anyone tell me what's going on here. Is there another method I should use? > > > Thanks > > Anthony > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From email2ants at gmail.com Tue May 17 20:06:37 2011 From: email2ants at gmail.com (Anthony Underwood) Date: Tue, 17 May 2011 21:06:37 +0100 Subject: [BioRuby] Bio::Sequence and Bio::Sequence::NA In-Reply-To: References: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> Message-ID: Dear Toshiaki Thank you for your reply. I have just tested your code below and all worked OK. I have found unexpectedly that embl_bioseq.translate works even though embl_bioseq.methods does not list translate as an available method. What extra methods does a Bio::Sequence::NA object instantiated using the auto method give? Thanks again for your advice, Anthony On 17 May 2011 02:33, Toshiaki Katayama wrote: > Hi Anthony, > > Bio::Sequence is a generic container class for a sequence with features > which was > introduced relatively recently for interconversion of Bio::GenBank, > Bio::EMBL and > Bio::SQL sequence objects (and it also provides common APIs for those seq > objects). > > Bio::Sequence#na or #auto method extract a sequence from a Bio::Sequence > object, so > you should use another variable to store (instead of overriding the object > reference). > > > biosequence = biosequence.auto > > seq = biosequence.auto > > so that you can still access to biosequence.features. > > % bioruby > bioruby> embl = Bio::EMBL.new(open(" > http://togows.dbcls.jp/entry/embl/J00231").read) > bioruby> embl.class > ==> Bio::EMBL > bioruby> embl_bioseq = embl.to_biosequence > bioruby> embl_bioseq.class > ==> Bio::Sequence > bioruby> embl_seq = embl_bioseq.auto > bioruby> embl_seq.class > ==> Bio::Sequence::NA > > Toshiaki > > > On 2011/05/16, at 18:07, Anthony Underwood wrote: > > > Here's something that has alway puzzled me about BioRuby > > > > If I start with a Bio::EMBL object and want to extract the features I can > do the following > > > > biosequence = embl_object.to_biosequence > > > > This returns an instance of a Bio::Sequence class. I can now access the > features > > > > features = biosequence.features > > > > However if the sequence is nucleotide and I want to translate it I have > to do the following > > > > biosequence.na > > > > OR > > > > biosequence = biosequence.auto > > > > > > This returns a Bio::Sequence::NA instance and I can now translate > > > > protein = biosequence.translate(1,11) > > > > > > Why can I not now get at the features > > > > biosequence.features #=> undefined method `features' for > # > > > > > > I would have though that after converting to Bio::Sequence::NA or > Bio::Sequence:AA the methods available to Bio::Sequence should still be > available. > > > > Can anyone tell me what's going on here. Is there another method I should > use? > > > > > > Thanks > > > > Anthony > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > From ktym at hgc.jp Tue May 17 20:53:18 2011 From: ktym at hgc.jp (Toshiaki Katayama) Date: Wed, 18 May 2011 05:53:18 +0900 Subject: [BioRuby] Bio::Sequence and Bio::Sequence::NA In-Reply-To: References: <6226C785-DF3B-45BA-887A-DD80AAEBFACD@gmail.com> Message-ID: <0992ED00-7DF5-4BCE-926B-8E615E77DB36@hgc.jp> Hi Anthony, In lib/bio/sequence.rb, you can find the definition of "method_missing". # Pass any unknown method calls to the wrapped sequence object. see # http://www.rubycentral.com/book/ref_c_object.html#Object.method_missing def method_missing(sym, *args, &block) #:nodoc: begin seq.__send__(sym, *args, &block) : # The sequence object, usually Bio::Sequence::NA/AA, # but could be a simple String attr_accessor :seq This means any methods which are not understood by the Bio::Sequence object are simply redirected to the internal sequence object. Therefore, if the sequence object is a Bio::Sequence::NA instance, it will respond to any methods implemented in the Bio::Sequence::NA class (and mix-ins). Older versions were much simpler but the internal code gets complicated over time to improve usability and functionality. Hopefully, it should be clearly documented. Cheers, Toshiaki On 2011/05/18, at 5:06, Anthony Underwood wrote: > Dear Toshiaki > > Thank you for your reply. > > I have just tested your code below and all worked OK. I have found unexpectedly that embl_bioseq.translate works even though embl_bioseq.methods does not list translate as an available method. > What extra methods does a Bio::Sequence::NA object instantiated using the auto method give? > > Thanks again for your advice, Anthony > > On 17 May 2011 02:33, Toshiaki Katayama wrote: > Hi Anthony, > > Bio::Sequence is a generic container class for a sequence with features which was > introduced relatively recently for interconversion of Bio::GenBank, Bio::EMBL and > Bio::SQL sequence objects (and it also provides common APIs for those seq objects). > > Bio::Sequence#na or #auto method extract a sequence from a Bio::Sequence object, so > you should use another variable to store (instead of overriding the object reference). > > > biosequence = biosequence.auto > > seq = biosequence.auto > > so that you can still access to biosequence.features. > > % bioruby > bioruby> embl = Bio::EMBL.new(open("http://togows.dbcls.jp/entry/embl/J00231").read) > bioruby> embl.class > ==> Bio::EMBL > bioruby> embl_bioseq = embl.to_biosequence > bioruby> embl_bioseq.class > ==> Bio::Sequence > bioruby> embl_seq = embl_bioseq.auto > bioruby> embl_seq.class > ==> Bio::Sequence::NA > > Toshiaki > > > On 2011/05/16, at 18:07, Anthony Underwood wrote: > > > Here's something that has alway puzzled me about BioRuby > > > > If I start with a Bio::EMBL object and want to extract the features I can do the following > > > > biosequence = embl_object.to_biosequence > > > > This returns an instance of a Bio::Sequence class. I can now access the features > > > > features = biosequence.features > > > > However if the sequence is nucleotide and I want to translate it I have to do the following > > > > biosequence.na > > > > OR > > > > biosequence = biosequence.auto > > > > > > This returns a Bio::Sequence::NA instance and I can now translate > > > > protein = biosequence.translate(1,11) > > > > > > Why can I not now get at the features > > > > biosequence.features #=> undefined method `features' for # > > > > > > I would have though that after converting to Bio::Sequence::NA or Bio::Sequence:AA the methods available to Bio::Sequence should still be available. > > > > Can anyone tell me what's going on here. Is there another method I should use? > > > > > > Thanks > > > > Anthony > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > From daijiendoh at gmail.com Thu May 19 01:04:24 2011 From: daijiendoh at gmail.com (=?ISO-2022-JP?B?GyRCMXNGI0JnRnMbKEI=?=) Date: Thu, 19 May 2011 10:04:24 +0900 Subject: [BioRuby] Disk cash on the parse genes Message-ID: Dear All I often download whole genbank data from bio at mirror ( such as gbbct12.seq ) and parse them. But recently, parsing the whole data became to be difficult. On some some step, the program need a long time to select nucleic acid sequences of genes or transcripts. It seems that selection of spliced or partial sequences from a long (genome) nucleic acid sequence using feature data. Anyone have strategies or methods avoiding these heavy steps ? Daiji Endoh Rakuno Gakuen University From R.A.Vos at reading.ac.uk Thu May 19 11:22:43 2011 From: R.A.Vos at reading.ac.uk (Rutger Vos) Date: Thu, 19 May 2011 12:22:43 +0100 Subject: [BioRuby] 10 days left to register: workshop phylogenetic pipelines, August 1-11 Message-ID: COMPUTATIONAL PHYLOINFORMATICS August 1 2011 through August 11 2011 Bioinformatics Center of Kyoto University Application Deadline: May 31, 2011 http://academy.nescent.org/wiki/Computational_phyloinformatics Computational Phyloinformatics is an 11-day international course (August 1-11, 2011) co-organized by the Computational Biology Research Center (CBRC/AIST), the Bioinformatics Center of Kyoto University, the Database Center for Life Science (DBCLS/JST), and the National Evolutionary Synthesis Center (NESCent). This course, which will take place at Kyoto University directly following the SMBE Meeting (http://smbe2011.com/), aims to give participants practical knowledge and hands-on skills in phyloinformatics. The venue in Kyoto is completely unaffected by the unfortunate events in Fukushima and the power shortages in Tokyo. We encourage biologists from other countries to participate in the SMBE meeting and/or this special international course, in solidarity with the scientific community of Japan in their effort to return to normalcy and to help minimize any negative impacts that the earthquake may have on scientific activities in Japan. SYNOPSIS Biologists are faced with ever-larger datasets, more complex evolutionary models, and increasingly elaborate analytical methods. Seldom is it sufficient to run a dataset with an off-the-shelf program on a desktop PC; increasingly, biologists need to write scripts to interface with internet services and databases, build analytical pipelines, customize analyses, and distribute computation over multiple processors. This course is designed for graduate students, postdocs, faculty, and researchers in phylogenetics interested in receiving practical, hands-on training in the use of Perl and SQL for workflows and applications in phyloinformatics. The course is divided into four parts: PART I: A tutorial review of Perl, including object oriented programming and building packages. PART II: Introduction and practical use of BioPerl and Bio::Phylo, (e.g. scripting for large tree inference engines, automating model testing, genomic-scale data mining and acquisition, supertree assembly, rate smoothing and branch calibration, tree traversal, etc). PART III: Introduction and practical use of BioRuby for molecular evolution and functional genomics (e.g. scripting multiple sequence alignment, gene duplication inference, tree inference, etc.). PART IV: Introduction to SQL and database design; computing and querying nested sets and transitive closure; querying both large trees (e.g. NCBI) and large collections of trees (e.g. TreeBASE). Participants will learn how to write basic phylogenetic or comparative analysis scripts, parse NEXUS files, traverse and compute over trees, and make practical use of phylogenetic software libraries. These skills will be learned in a biological context, touching on a diverse array of topics such as analysis of large datasets, automation of supertree assembly, querying for topological patterns in large collections of trees, etc. Participants will leave the course with a full set of installations and libraries on their computer ready to build phyloinformatic workflows for their own research projects, as well as continued access to a 50+ page wiki "textbook" containing step-by-step instructions, problem sets, and examples. INSTRUCTORS AND COURSE ORGANIZERS Christian Zmasek, Karen Cranston, Rutger A. Vos, Susumu Goto, Toshiaki Katayama, William H. Piel APPLICATION DEADLINE May 31, 2011 TUITION ?40,000 (~$500) Participants are responsible for their own travel costs, including transportation and accommodation -- see the website for more information. International participants will benefit by combining attendance with the 2011 SMBE meeting. A limited number of travel scholarships from NESCent are available for US-based students. Preference will be given to students from under-represented minorities. SUBSIDIES AND SCHOLARSHIPS A limited number of travel scholarships from NESCent are available for US-based students. Preference will be given to students from under-represented minorities. The Asia-Pacific Bioinformatics Network (APBioNet) is happy to provide travel assistance for a limited number of students/early career researchers from the Asia-Pacific region. Applicants are requested to contact Dr Asif Khan, APBioNet Secretariat: asif -$- bic.nus.edu.sg (replace -$- with @) for details. PREREQUISITES BIOLOGY: A good understanding of phylogenetics ? for example, having already taken the Workshop on Molecular Evolution (http://www.molecularevolution.org/) or equivalent coursework or experience. COMPUTING: Prior experience with Perl or careful study of the suggested reading materials in advance of the class (see web site). Participants should have some experience with basic Unix shell commands. EQUIPMENT: Participants are expected to bring their own Mac OSX computer or a LINUX computer, else they will be provided with an iMac. Participants who cannot bring their own computer and will be using a supplied iMac, should consider bringing their own portable firewire/usb drive so that they can also leave the course with a full suite of phyloinformatic software tools. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading, RG6 6BX, United Kingdom Tel: +44 (0) 118 378 7535 http://rutgervos.blogspot.com From ktym at hgc.jp Fri May 20 06:31:13 2011 From: ktym at hgc.jp (Toshiaki Katayama) Date: Fri, 20 May 2011 15:31:13 +0900 Subject: [BioRuby] Disk cash on the parse genes In-Reply-To: References: Message-ID: <4DD023E9-9FBE-4A38-AE9C-A19FD4751CDF@hgc.jp> Dear Endoh-san, Thank you for pointing this problem out. I tried to parse gbbct12.seq file with the example code based on our tutorial at http://bioruby.open-bio.org/wiki/Tutorial and found that the actual problem is in the multiple calling of the gb.naseq method. The method is defined as shown in below and which doesn't cache the generated Bio::Sequence::NA object, therefore, it will take long time if called multiple times, especially for a long sequence. bio/db/genbank/genbank.rb: def seq unless @data['SEQUENCE'] origin end Bio::Sequence::NA.new(@data['SEQUENCE']) end alias naseq seq If I store the object outside of the loop of feature manipulation, it became much faster. % ruby gbparse.rb gbbct12.seq > gbbct12.out 2> gbbct12.err Parsed 16125 entries in 1645.838824 sec. % ruby gbparse_new.rb gbbct12.seq > gbbct12.out_new 2> gbbct12.err_new Parsed 16125 entries in 39.012607 sec. Based on this observation, could you check the algorithm of your code? Regards, Toshiaki Katayama -------------- next part -------------- On 2011/05/19, at 10:04, ???? wrote: > Dear All > > I often download whole genbank data from bio at mirror ( such as > gbbct12.seq ) and parse them. > But recently, parsing the whole data became to be difficult. On some > some step, the program need a long time to select nucleic acid > sequences of genes or transcripts. It seems that selection of spliced > or partial sequences from a long (genome) nucleic acid sequence using > feature data. > > Anyone have strategies or methods avoiding these heavy steps ? > > Daiji Endoh > Rakuno Gakuen University > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From daijiendoh at gmail.com Sat May 21 05:33:58 2011 From: daijiendoh at gmail.com (=?ISO-2022-JP?B?GyRCMXNGI0JnRnMbKEI=?=) Date: Sat, 21 May 2011 14:33:58 +0900 Subject: [BioRuby] Fwd: Disk cash on the parse genes In-Reply-To: <4DD023E9-9FBE-4A38-AE9C-A19FD4751CDF@hgc.jp> References: <4DD023E9-9FBE-4A38-AE9C-A19FD4751CDF@hgc.jp> Message-ID: Dear Katayama-san I am very very grateful to your suggestion. I have been struggled on this problem for 6 months. Using your code, I can overcome the problem. But, only one point the code stopped. If the feature.position refer to the other entry such as "join(M52614.1:1..1456,5216..5823), the code returned a error. So I added a line below. next if position =~ /[A-Z]+\d+\W*\d*\:/ The inserting code now working. I attached the modified code. Thanks again, Daiji Endoh ************************************************************************ Dear Endoh-san, Thank you for pointing this problem out. I tried to parse gbbct12.seq file with the example code based on our tutorial at http://bioruby.open-bio.org/wiki/Tutorial and found that the actual problem is in the multiple calling of the gb.naseq method. The method is defined as shown in below and which doesn't cache the generated Bio::Sequence::NA object, therefore, it will take long time if called multiple times, especially for a long sequence. bio/db/genbank/genbank.rb: def seq unless @data['SEQUENCE'] origin end Bio::Sequence::NA.new(@data['SEQUENCE']) end alias naseq seq If I store the object outside of the loop of feature manipulation, it became much faster. % ruby gbparse.rb gbbct12.seq > gbbct12.out 2> gbbct12.err Parsed 16125 entries in 1645.838824 sec. % ruby gbparse_new.rb gbbct12.seq > gbbct12.out_new 2> gbbct12.err_new Parsed 16125 entries in 39.012607 sec. Based on this observation, could you check the algorithm of your code? Regards, Toshiaki Katayama **************************************************************************************** On 2011/05/19, at 10:04, ???? wrote: > Dear All > > I often download whole genbank data from bio at mirror ( such as > gbbct12.seq ) and parse them. > But recently, parsing the whole data became to be difficult. On some > some step, the program need a long time to select nucleic acid > sequences of genes or transcripts. It seems that selection of spliced > or partial sequences from a long (genome) nucleic acid sequence using > feature data. > > Anyone have strategies or methods avoiding these heavy steps ? > > Daiji Endoh > Rakuno Gakuen University > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- ?????????????????? ???? ?069-8501????????????582 Tel: 011-388-4847 Fax:011-387-5890 From ktym at hgc.jp Mon May 23 03:28:08 2011 From: ktym at hgc.jp (Toshiaki Katayama) Date: Mon, 23 May 2011 12:28:08 +0900 Subject: [BioRuby] Fwd: Disk cash on the parse genes In-Reply-To: References: <4DD023E9-9FBE-4A38-AE9C-A19FD4751CDF@hgc.jp> Message-ID: Dear Endoh-san, > Using your code, I can overcome the problem. Good! > "join(M52614.1:1..1456,5216..5823), the code returned a error. External reference should be detected by the Bio::Location class and the ID will be stored in an instance variable @xref_id, however, how to deal with it is up to users, so you need to implement some code to fetch external entry (in this case @xref_id="M52614.1") from available services (local DB or web service etc.) and extract the sub-sequence from the entry. Please take a look at pattern (G) in the documentation. http://bioruby.open-bio.org/rdoc/classes/Bio/Locations.html Unfortunately, I've got an unexplained "Exception" error from NCBI when retrieving http://www.ncbi.nlm.nih.gov/nuccore?term=M52614.1 so, I'll use "join(U75473.1:1..293,1..216)" found in a GenBank entry SMMFD02 (gbbct65.seq) for example. # obtain a genbank record bioruby> entry = getobj("genbank:SMMFD02") or bioruby> entry = Bio::GenBank.new(open("http://togows.dbcls.jp/entry/ncbi-genbank/SMMFD02").read) # cache whole sequence as we learnt in this thread :-) bioruby> naseq = entry.naseq # pick up "gene" features only bioruby> genes = entry.features.select {|x| x.feature == "gene" } ==> [#]>] # example to handle external references in a given position bioruby> genes.each do |gene| locations = Bio::Locations.new(gene.position) locations.each do |location| if xref = location.xref_id xref_entry = open("http://togows.dbcls.jp/entry/ncbi-genbank/#{xref}").read location.sequence = Bio::GenBank.new(xref_entry).naseq.subseq(location.from, location.to) end end gene.position = locations.to_s # (*1) puts naseq.splice(gene.position) # (*2) end (*1) will generate the following string join(replace(U75473.1:1..293,"gtcttcttgttggtgatgttggttttggaaaaacggaagtagcgatgagagctgcttttaaagcagttaatgatgataaacaagttgctgttttggtgccaacaacagttcttgctcaacagcactataatacttttaaggagcgctttgaaaattttcctgtcaatgttgccatgatgagtcgttttaaaaccaagactgaacagtctgaaacgttaactaaattagctaagggacaggttgatatcattattggaacacatcgtctactttctaaagatgttacgtttaaa"),1..216) (*2) will return 293 + 216 = 509 bp sequence gtcttcttgttggtgatgttggttttggaaaaacggaagtagcgatgagagctgcttttaaagcagttaatgatgataaacaagttgctgttttggtgccaacaacagttcttgctcaacagcactataatacttttaaggagcgctttgaaaattttcctgtcaatgttgccatgatgagtcgttttaaaaccaagactgaacagtctgaaacgttaactaaattagctaagggacaggttgatatcattattggaacacatcgtctactttctaaagatgttacgtttaaaggggttaaacacaaggaaacattgaaagaattaaaaactaaggttgatgtcttgaccttgacagcaactcctattccacggacattacatatgtctatgcttggtatacgagatttatcagttattgaaacacctccaagtaatcgttaccctgtccagacttatgttatggaaacaaatgcaagtgtcattcgtgaagctattatgcgtgaaatt During this trial, I found a bug in the Bio::Sequence#splice method. bio/sequence/common.rb: def splice(position) unless position.is_a?(Locations) then position = Locations.new(position) end s = '' position.each do |location| if location.sequence s << location.sequence else # <----- (*3) exon = self.subseq(location.from, location.to) begin exon.complement! if location.strand < 0 rescue NameError end s << exon end end return self.class.new(s) end alias splicing splice We need to fix this else block (*3) to mind if @xref_id exists or not. Currently, "join(U75473.1:1..293,1..216)" will be treated as "join(1..293,1..216)" and, obviously, it is not feasible. Toshiaki On 2011/05/21, at 14:33, ???? wrote: > Dear Katayama-san > > I am very very grateful to your suggestion. I have been struggled on > this problem for 6 months. > Using your code, I can overcome the problem. > > But, only one point the code stopped. > If the feature.position refer to the other entry such as > "join(M52614.1:1..1456,5216..5823), the code returned a error. > So I added a line below. > > next if position =~ /[A-Z]+\d+\W*\d*\:/ > > The inserting code now working. > I attached the modified code. > Thanks again, > > Daiji Endoh > ************************************************************************ > Dear Endoh-san, > > Thank you for pointing this problem out. > > I tried to parse gbbct12.seq file with the example code based on > our tutorial at http://bioruby.open-bio.org/wiki/Tutorial and found > that the actual problem is in the multiple calling of the gb.naseq method. > > The method is defined as shown in below and which doesn't cache > the generated Bio::Sequence::NA object, therefore, it will take > long time if called multiple times, especially for a long sequence. > > bio/db/genbank/genbank.rb: > def seq > unless @data['SEQUENCE'] > origin > end > Bio::Sequence::NA.new(@data['SEQUENCE']) > end > alias naseq seq > > If I store the object outside of the loop of feature manipulation, > it became much faster. > > % ruby gbparse.rb gbbct12.seq > gbbct12.out 2> gbbct12.err > Parsed 16125 entries in 1645.838824 sec. > > % ruby gbparse_new.rb gbbct12.seq > gbbct12.out_new 2> gbbct12.err_new > Parsed 16125 entries in 39.012607 sec. > > Based on this observation, could you check the algorithm of your code? > > Regards, > Toshiaki Katayama > **************************************************************************************** > > > On 2011/05/19, at 10:04, ???? wrote: > >> Dear All >> >> I often download whole genbank data from bio at mirror ( such as >> gbbct12.seq ) and parse them. >> But recently, parsing the whole data became to be difficult. On some >> some step, the program need a long time to select nucleic acid >> sequences of genes or transcripts. It seems that selection of spliced >> or partial sequences from a long (genome) nucleic acid sequence using >> feature data. >> >> Anyone have strategies or methods avoiding these heavy steps ? >> >> Daiji Endoh >> Rakuno Gakuen University >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby > > > > > > -- > ?????????????????? > ???? > ?069-8501????????????582 > Tel: 011-388-4847 > Fax:011-387-5890 > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From mictadlo at gmail.com Wed May 25 21:10:26 2011 From: mictadlo at gmail.com (Michal) Date: Thu, 26 May 2011 07:10:26 +1000 Subject: [BioRuby] BioRuby with Reia Message-ID: <4DDD7042.9050708@gmail.com> Hello, would be possible to run BioRuby with Reia ( http://en.wikipedia.org/wiki/Reia_(programming_language) )? From bonnal at ingm.org Thu May 26 08:31:33 2011 From: bonnal at ingm.org (Raoul Bonnal) Date: Thu, 26 May 2011 10:31:33 +0200 Subject: [BioRuby] BioRuby with Reia In-Reply-To: <4DDD7042.9050708@gmail.com> References: <4DDD7042.9050708@gmail.com> Message-ID: <10EFE5AC-BBA0-4113-9878-FD92C9ECF326@ingm.org> On 25/mag/2011, at 23.10, Michal wrote: > Hello, > would be possible to run BioRuby with Reia ( http://en.wikipedia.org/wiki/Reia_(programming_language) )? Don't know, which is the advantage ? -- Ra From pjotr.public14 at thebird.nl Thu May 26 10:01:50 2011 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 26 May 2011 12:01:50 +0200 Subject: [BioRuby] BioRuby with Reia In-Reply-To: <10EFE5AC-BBA0-4113-9878-FD92C9ECF326@ingm.org> References: <4DDD7042.9050708@gmail.com> <10EFE5AC-BBA0-4113-9878-FD92C9ECF326@ingm.org> Message-ID: <20110526100149.GA19867@thebird.nl> Reia is not Ruby compatible. Even if it has some similar Syntax. So, BioRuby won't run on Reia. Pj. On Thu, May 26, 2011 at 10:31:33AM +0200, Raoul Bonnal wrote: > > On 25/mag/2011, at 23.10, Michal wrote: > > > Hello, > > would be possible to run BioRuby with Reia ( http://en.wikipedia.org/wiki/Reia_(programming_language) )? > Don't know, which is the advantage ? > > -- > Ra > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From mictadlo at gmail.com Sat May 28 23:47:49 2011 From: mictadlo at gmail.com (Michal) Date: Sun, 29 May 2011 09:47:49 +1000 Subject: [BioRuby] samtools-ruby In-Reply-To: <74EAEDFF-D8D0-42E0-93B8-51C4986CEC65@ingm.it> References: <4D454E91.1080604@gmail.com> <4D4A6459.5050205@gmail.com> <5B31A257-DDCB-4BBB-A201-B4D708E82BE0@kenroku.kanazawa-u.ac.jp> <4D4BB255.7030703@gmail.com> <4D4BE242.2030408@gmail.com> <4D4BFB7D.9070004@gmail.com> <74EAEDFF-D8D0-42E0-93B8-51C4986CEC65@ingm.it> Message-ID: <4DE189A5.2010201@gmail.com> Hello, how is it possible to get the following pileup output with bioruby-samtools? coverage at base 99 = 1 base in read EAS56_57:6:190:289:82 = A coverage at base 100 = 1 base in read EAS56_57:6:190:289:82 = G coverage at base 101 = 1 base in read EAS56_57:6:190:289:82 = G coverage at base 102 = 2 base in read EAS56_57:6:190:289:82 = G base in read EAS51_64:3:190:727:308 = G I have found only a this python code : import pysam samfile = pysam.Samfile("ex1.bam", "rb" ) for pileupcolumn in samfile.pileup( 'chr1', 100, 120): print print 'coverage at base %s = %s' % (pileupcolumn.pos , pileupcolumn.n) for pileupread in pileupcolumn.pileups: print '\tbase in read %s = %s' % (pileupread.alignment.qname, pileupread.alignment.seq[pileupread.qpos]) samfile.close() But do not know how to do it with Ruby. Thank you in advance. On 02/04/2011 11:57 PM, Raoul Bonnal wrote: > In these days, w.e. too, I have no time for sam tools. From the next > week I could spend more time on this project and improve test, > usability and platform supports. > > > > On 04/feb/2011, at 14.13, Michal wrote: > >> Hi, >> I would be happy if would find out how to get on a particular >> position the alignment and then I could give feedback. >> >> Pysam http://code.google.com/p/pysam/ contains all files and tests. >> ~/Downloads/pysam-0.3.1/tests$ ls >> 00README.txt ex4.sam ex8.sam Makefile >> ex1.fa ex5.sam example.gtf.gz pysam_test.py >> ex1.sam.gz ex6.sam example.gtf.gz.tbi segfault_tests.py >> ex3.sam ex7.sam example.py tabix_test.py >> >> Maybe it would be possible to test bioruby-samtools in the same way. >> Pysam is ship out with samtools source code and maybe could be used >> it for bioruby-samtools. >> >> Thank you in advance. >> >> Michal >> >> >> On 02/04/2011 10:26 PM, Tomoaki NISHIYAMA wrote: >>> Hi, >>> >>>> What I have forgotten to do? >>> >>> Now, you are at the point I reached yesterday and >>> I don't think you have forgotten anything. >>> >>> From yesterday's mail: >>>> 1) Failure: >>>> test: BioSamtools should probably rename this file and start >>>> testing for real. (TestBioSamtools) [test/test_bio-samtools.rb:5]: >>>> hey buddy, you should probably rename this file and start testing >>>> for real >>>> >>>> Loading seems ok. >>>> I'm not sure if this is bad or ok. >>> >>> You could look at test/test_bio-samtools.rb >>> $ cat test/test_bio-samtools.rb >>> require 'helper' >>> >>> class TestBioSamtools < Test::Unit::TestCase >>> should "probably rename this file and start testing for real" do >>> flunk "hey buddy, you should probably rename this file and start >>> testing for real" >>> end >>> end >>> >>> and guess what it means. >>> >>> My guess is that this is test not implemented yet. >>> So, this error does not tell if the library function well or can not >>> used at all. >>> You might just try what you wanted to do and see if it works. >>> >>>> I understand how difficult it is to keep track and it is a good >>>> idea to ship bioruby-samtools >>>> with a working samtools version like Raoul does it. >>> >>> My view is the opposite. >>> Since it potentially has many bugs and changes rapidly, bundled >>> shipping is ineffective. >>> With the lack of test code, we cannot even tell which is a good >>> working version. >>> -- >>> Tomoaki NISHIYAMA >>> >>> Advanced Science Research Center, >>> Kanazawa University, >>> 13-1 Takara-machi, >>> Kanazawa, 920-0934, Japan >>> >> > > -- > R.J.P.B. > > >