From francesco.strozzi at gmail.com Wed Jan 4 04:50:14 2012 From: francesco.strozzi at gmail.com (Francesco Strozzi) Date: Wed, 4 Jan 2012 10:50:14 +0100 Subject: [BioRuby] Bio::Faster plugin Message-ID: Hi guys, I have created a BioRuby plugin called bio-faster, that implements a fast and simple parser for FastA and FastQ files. It's based on the C library Kseq written by Heng Li (author of Samtools and BWA). Compared to Bio::FastQ it is actually 4-5 times faster in parsing large FastQ files. The code will not create a Bio object for each sequence but it will return a simple array with sequence data and quality values for FastQ (it supports Sanger/Phred format only). Bio::Faster could be a good choice when you just need to parse huge files, for example to extract information or to store sequence data in a database, and you don't need to create an object for each sequence but you only want to parse the dataset easily and quickly. Here is the code: https://github.com/fstrozzi/bioruby-faster Here is the wiki for more details: https://github.com/fstrozzi/bioruby-faster/wiki To get the gem: gem install bio-faster Tested with Ruby 1.9 only. Any comment or feedback is much appreciated! Cheers -- Francesco From bonnal at ingm.org Wed Jan 4 10:05:00 2012 From: bonnal at ingm.org (Raoul Bonnal) Date: Wed, 04 Jan 2012 16:05:00 +0100 Subject: [BioRuby] Bio::Faster plugin In-Reply-To: Message-ID: Hi Francesco, It's very cool! And you can access to the seq object/array also in this way: Bio::Faster.parse(File.join(TEST_DATA,"sample.fastq")) do |id, comments, sequence, quality| puts "#{id} #{comments} #{sequence} #{quality}" end Obviously I like it more than using the raw array :-) I suppose in case of no quality value you get a nil object +1 On 04/01/12 10.50, "Francesco Strozzi" wrote: > Hi guys, > > I have created a BioRuby plugin called bio-faster, that implements a fast > and simple parser for FastA and FastQ files. It's based on the C library > Kseq written by Heng Li (author of Samtools and BWA). Compared to > Bio::FastQ it is actually 4-5 times faster in parsing large FastQ files. > The code will not create a Bio object for each sequence but it will return > a simple array with sequence data and quality values for FastQ (it supports > Sanger/Phred format only). > Bio::Faster could be a good choice when you just need to parse huge files, > for example to extract information or to store sequence data in a database, > and you don't need to create an object for each sequence but you only want > to parse the dataset easily and quickly. > > Here is the code: https://github.com/fstrozzi/bioruby-faster > Here is the wiki for more details: > https://github.com/fstrozzi/bioruby-faster/wiki > To get the gem: gem install bio-faster > > Tested with Ruby 1.9 only. > > Any comment or feedback is much appreciated! > > Cheers From georgkam at gmail.com Wed Jan 4 10:20:42 2012 From: georgkam at gmail.com (George Githinji) Date: Wed, 4 Jan 2012 18:20:42 +0300 Subject: [BioRuby] Bio::Faster plugin In-Reply-To: References: Message-ID: ++1 Sounds cool! On Wed, Jan 4, 2012 at 6:05 PM, Raoul Bonnal wrote: > Hi Francesco, > It's very cool! > > And you can access to the seq object/array also in this way: > Bio::Faster.parse(File.join(TEST_DATA,"sample.fastq")) do |id, comments, > sequence, quality| > ?puts "#{id} #{comments} #{sequence} #{quality}" > end > > Obviously I like it more than using the raw array :-) > I suppose in case of no quality value you get a nil object > > > +1 > > > On 04/01/12 10.50, "Francesco Strozzi" wrote: > >> Hi guys, >> >> I have created a BioRuby plugin called bio-faster, that implements a fast >> and simple parser for FastA and FastQ files. It's based on the C library >> Kseq written by Heng Li (author of Samtools and BWA). Compared to >> Bio::FastQ it is actually 4-5 times faster in parsing large FastQ files. >> The code will not create a Bio object for each sequence but it will return >> a simple array with sequence data and quality values for FastQ (it supports >> Sanger/Phred format only). >> Bio::Faster could be a good choice when you just need to parse huge files, >> for example to extract information or to store sequence data in a database, >> and you don't need to create an object for each sequence but you only want >> to parse the dataset easily and quickly. >> >> Here is the code: https://github.com/fstrozzi/bioruby-faster >> Here is the wiki for more details: >> https://github.com/fstrozzi/bioruby-faster/wiki >> To get the gem: gem install bio-faster >> >> Tested with Ruby 1.9 only. >> >> Any comment or feedback is much appreciated! >> >> Cheers > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ Twitter: http://twitter.com/#!/george_l From mictadlo at gmail.com Mon Jan 16 01:24:28 2012 From: mictadlo at gmail.com (Mic) Date: Mon, 16 Jan 2012 16:24:28 +1000 Subject: [BioRuby] Unique reads Message-ID: Hello, I read in many papers that they made unique reads before the reads were align and later on the SNPs were called. However, I could not find out how they do it. Which tool can be used to do it? Thank you in advance. From mh6 at sanger.ac.uk Mon Jan 16 04:41:36 2012 From: mh6 at sanger.ac.uk (Michael Paulini) Date: Mon, 16 Jan 2012 09:41:36 +0000 Subject: [BioRuby] Unique reads In-Reply-To: References: Message-ID: <4F13F0D0.2020609@sanger.ac.uk> On 16/01/12 06:24, Mic wrote: > Hello, > I read in many papers that they made unique reads before the reads > were align and later on the SNPs were called. However, I could not find out > how they do it. > > Which tool can be used to do it? > > Thank you in advance. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby WU-Blast/AB-Blast had a tool to collapse identical fasta entries called nrdb. But you can write your own by comparing the sequences. M -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From mictadlo at gmail.com Mon Jan 16 08:01:56 2012 From: mictadlo at gmail.com (Mic) Date: Mon, 16 Jan 2012 23:01:56 +1000 Subject: [BioRuby] compare sequences Message-ID: Hello, Is there anyway a memory efficient way to compare sequences like from NGS? Thank you in advance. From p.j.a.cock at googlemail.com Mon Jan 16 08:30:01 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 16 Jan 2012 13:30:01 +0000 Subject: [BioRuby] compare sequences In-Reply-To: References: Message-ID: On Mon, Jan 16, 2012 at 1:01 PM, Mic wrote: > Hello, > Is there anyway a?memory?efficient?way to ?compare ?sequences like from NGS? > > Thank you in advance. Hi Mic, Could you stop posting such broad questions to multiple mailing lists simultaneously please? Perhaps you would find Biostars Q&A more useful? http://biostar.stackexchange.com/ See also: http://dx.doi.org/10.1371/journal.pcbi.1002202 Peter From donttrustben at gmail.com Thu Jan 19 02:58:01 2012 From: donttrustben at gmail.com (Ben Woodcroft) Date: Thu, 19 Jan 2012 17:58:01 +1000 Subject: [BioRuby] What is the bar for releasing biogems? Message-ID: Hi there, I was hoping for some advice from the list about policy on releasing biogems. I have a few (2 or 3) biogems on my computer which: * Solve a discrete problem (in my case, wrapping around an underlying bioiformatic program and parsing the result) * At least a little bit unit tested * Are bioinformatics-related However, they are also: * Not fantastic leaps forward - they don't solve big problems * Probably limited to a small audience, since the programs themselves would be of little use outside the (not large) field (bioinformatics of protein sub-cellular localisation in apicomplexan parasites). I find having a biogem is a convenient mechanism. But I don't want to release code that is of no use to anyone else (or any more of it..), particularly as I cannot know whether I'll continue to use the code once my PhD is done. Should I release the gems? Thanks, ben -- Ben J Woodcroft, BE (Hons) PhD Candidate Ralph Laboratory The University of Melbourne Melbourne, Australia tel: (+613) 8344 2319 pgrad.unimelb.edu.au after b.woodcroft From pjotr.public14 at thebird.nl Thu Jan 19 03:23:36 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 19 Jan 2012 09:23:36 +0100 Subject: [BioRuby] What is the bar for releasing biogems? In-Reply-To: References: Message-ID: <20120119082336.GB17283@thebird.nl> Hi Ben, On Thu, Jan 19, 2012 at 05:58:01PM +1000, Ben Woodcroft wrote: > Hi there, > > I was hoping for some advice from the list about policy on releasing > biogems. I have a few (2 or 3) biogems on my computer which: > * Solve a discrete problem (in my case, wrapping around an underlying > bioiformatic program and parsing the result) > * At least a little bit unit tested > * Are bioinformatics-related > > However, they are also: > * Not fantastic leaps forward - they don't solve big problems > * Probably limited to a small audience, since the programs themselves would > be of little use outside the (not large) field (bioinformatics of protein > sub-cellular localisation in apicomplexan parasites). > > I find having a biogem is a convenient mechanism. But I don't want to > release code that is of no use to anyone else (or any more of it..), > particularly as I cannot know whether I'll continue to use the code once my > PhD is done. Should I release the gems? Yes. Please post them, Do not worry about interest, take up, or quality. That is up to the people who ultimately take an interest in your gems. If there are issues they may approach you, or become maintainers themselves. With the growth of number of gems we will find ways to handle presentation and quality issues. First on our list is an automated testing frame work - which will show on the site the gems that pass their tests. Next we will create a subsection for 'development' and/or 'unstable' gems. That way normal users can feel safe in using the tested and stable gems. The meta packages (bio-core etc) already have an implicit policy it that way. Anyone should be able to install bio-core gems. In other words, don't worry. Release early and often. That is the OSS adagio, that is what http://biogems.info/ is about. Pj. From pjotr.public14 at thebird.nl Thu Jan 19 03:38:41 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 19 Jan 2012 09:38:41 +0100 Subject: [BioRuby] What is the bar for releasing biogems? In-Reply-To: <20120119082336.GB17283@thebird.nl> References: <20120119082336.GB17283@thebird.nl> Message-ID: <20120119083841.GA19135@thebird.nl> I added the following text to http://www.biogems.info/howto.html Biogem aims to encourage open source software development, and provide tools to support bazaar programming. In our interpretation: * Every good work of software starts by scratching a developer's personal itch * Release source code early and often * With enough users, almost every problem is quickly known, and a fix obvious * Many heads are inevitably better than one * Good programmers know what to write. Great ones know what to rewrite (and reuse) From bonnal at ingm.org Thu Jan 19 05:40:58 2012 From: bonnal at ingm.org (Raoul Bonnal) Date: Thu, 19 Jan 2012 11:40:58 +0100 Subject: [BioRuby] What is the bar for releasing biogems? In-Reply-To: <20120119082336.GB17283@thebird.nl> Message-ID: Hi Ben, I agree with Pjotr, post and publish the gems. >From my experience is better to publish/release on rubygems when the gems is enough stable to be used by others otherwise you will spend a lot of time on fixing their issues. The gem doesn't need a huge documentation, short and clear is better, some comment in the code and that's it. Find an effective description for your gem, every time I look for gems I read/search the short description. Another rule I'm trying to follow is to define/reuse a namespace which is pluggable into BioRuby -but this is my personal point of view-. Don't forget to drop an email here :-) We need to restart IRC meetings... On 19/01/12 09.23, "Pjotr Prins" wrote: > Hi Ben, > > On Thu, Jan 19, 2012 at 05:58:01PM +1000, Ben Woodcroft wrote: >> Hi there, >> >> I was hoping for some advice from the list about policy on releasing >> biogems. I have a few (2 or 3) biogems on my computer which: >> * Solve a discrete problem (in my case, wrapping around an underlying >> bioiformatic program and parsing the result) >> * At least a little bit unit tested >> * Are bioinformatics-related >> >> However, they are also: >> * Not fantastic leaps forward - they don't solve big problems >> * Probably limited to a small audience, since the programs themselves would >> be of little use outside the (not large) field (bioinformatics of protein >> sub-cellular localisation in apicomplexan parasites). >> >> I find having a biogem is a convenient mechanism. But I don't want to >> release code that is of no use to anyone else (or any more of it..), >> particularly as I cannot know whether I'll continue to use the code once my >> PhD is done. Should I release the gems? > > Yes. Please post them, > > Do not worry about interest, take up, or quality. That is up to the > people who ultimately take an interest in your gems. If there are > issues they may approach you, or become maintainers themselves. > > With the growth of number of gems we will find ways to handle > presentation and quality issues. First on our list is an automated > testing frame work - which will show on the site the gems that pass > their tests. Next we will create a subsection for 'development' > and/or 'unstable' gems. That way normal users can feel safe in using > the tested and stable gems. The meta packages (bio-core etc) already > have an implicit policy it that way. Anyone should be able to install > bio-core gems. > > In other words, don't worry. Release early and often. That is the OSS > adagio, that is what http://biogems.info/ is about. > > Pj. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From p.j.a.cock at googlemail.com Fri Jan 20 05:46:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 20 Jan 2012 10:46:18 +0000 Subject: [BioRuby] NCBI adoption of AGP v2.0 and new qualifiers in GenBank/EMBL Message-ID: Dear all, I just spotted this via the @NCBI twitter feed, http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_spec_change.shtml In addition to the NCBI switch from AGP v1.1 to v2.0, the INSDC have recently added a new feature type called "assembly_gap", and the associated qualifiers "gap_type" and "linkage_evidence" to the INSDC Feature Table Definitons. Quoting from version 10.0, dated Dec 2011 http://www.insdc.org/documents/feature_table.html#7.2 > Feature Key assembly_gap > > > Definition gap between two components of a CON record that is > part of a genome assembly; > > Mandatory qualifiers /estimated_length=unknown or > /gap_type="TYPE" > /linkage_evidence="TYPE" (Note: Mandatory only if the > /gap_type is "within scaffold" or "repeat within > scaffold".If there are multiple types of linkage_evidence > they will appear as multiple /linkage_evidence="TYPE" > qualifiers. For all other types of assembly_gap > features, use of the /linkage_evidence qualifier is > invalid.) > > Comment the location span of the assembly_gap feature for an > unknown gap is 100 bp, with the 100 bp indicated as > 100 "n"'s in sequence. > i.e. DDBJ, ENA & GenBank flat-files will start to use the "assembly_gap" features to display information derived from version 2.0 AGP files from 10th Feb 2012. Probably this will affect the XML variants as well. Unless any of the parsers/writers for GenBank or EMBL flat files use a white list approach, the new feature key and qualifiers shouldn't cause a problem. Peter From pjotr.public14 at thebird.nl Sun Jan 22 05:53:19 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sun, 22 Jan 2012 11:53:19 +0100 Subject: [BioRuby] Generating your own bioinformatics code for plugins using templates (DRY) Message-ID: <20120122105319.GA3001@thebird.nl> I have been working on a document to discuss ways of modifying Biogem, so you can generate your own code, and avoid repetitious work. The design of the biogem code generator is based on templates, and there are accessible ways to hack it by adding your own templates. Don't repeat yourself (DRY)! https://github.com/pjotrp/bioruby-gem/blob/master/doc/biogem-hacking.md Please comment. Pj. (note a few links are broken because they point to Raouls tree, which needs to merge my changes) From bonnal at ingm.org Sun Jan 22 10:03:57 2012 From: bonnal at ingm.org (Raoul Bonnal) Date: Sun, 22 Jan 2012 16:03:57 +0100 Subject: [BioRuby] Generating your own bioinformatics code for plugins using templates (DRY) In-Reply-To: <20120122105319.GA3001@thebird.nl> Message-ID: <20120122150357.d70ef3b0@mail.ingm.it> Hi Pjotr, well done. Pull merged. There are new feature I'm going to implement for the future release like, updating the project with new options like adding binary or a database if you did select the option at the beginning. _____ From: Pjotr Prins [mailto:pjotr.public14 at thebird.nl] To: BioRuby Mailing List [mailto:bioruby at lists.open-bio.org] Sent: Sun, 22 Jan 2012 11:53:19 +0100 Subject: [BioRuby] Generating your own bioinformatics code for plugins using templates (DRY) I have been working on a document to discuss ways of modifying Biogem, so you can generate your own code, and avoid repetitious work. The design of the biogem code generator is based on templates, and there are accessible ways to hack it by adding your own templates. Don't repeat yourself (DRY)! https://github.com/pjotrp/bioruby-gem/blob/master/doc/biogem-hacking.md Please comment. Pj. (note a few links are broken because they point to Raouls tree, which needs to merge my changes) _______________________________________________ BioRuby Project - http://www.bioruby.org/ BioRuby mailing list BioRuby at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Tue Jan 24 05:50:14 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 24 Jan 2012 11:50:14 +0100 Subject: [BioRuby] Ruby documentation with syntax highlight (Wiki no more) Message-ID: <20120124105014.GA22535@thebird.nl> Github has syntax highlighting for markdown. This has potential for all Ruby documents! We can maintain docs in git, and they get displayed on github. http://github.github.com/github-flavored-markdown/ Example https://github.com/pjotrp/bioruby-gem/blob/master/doc/biogem-hacking.md No more wiki documentation, as far as I am concerned. Pj. From pjotr.public14 at thebird.nl Thu Jan 26 05:34:23 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 26 Jan 2012 11:34:23 +0100 Subject: [BioRuby] EU-codefest in July Message-ID: <20120126103423.GA10234@thebird.nl> EU-codefest 2012 will be 19 and 20 July in Lodi Italy In coordination with the BOSC committee we are organising the first EU-Codefest, a low-key event the week after BOSC 2012. http://www.open-bio.org/wiki/EU_Codefest_2012 From francesco.strozzi at gmail.com Wed Jan 4 09:50:14 2012 From: francesco.strozzi at gmail.com (Francesco Strozzi) Date: Wed, 4 Jan 2012 10:50:14 +0100 Subject: [BioRuby] Bio::Faster plugin Message-ID: Hi guys, I have created a BioRuby plugin called bio-faster, that implements a fast and simple parser for FastA and FastQ files. It's based on the C library Kseq written by Heng Li (author of Samtools and BWA). Compared to Bio::FastQ it is actually 4-5 times faster in parsing large FastQ files. The code will not create a Bio object for each sequence but it will return a simple array with sequence data and quality values for FastQ (it supports Sanger/Phred format only). Bio::Faster could be a good choice when you just need to parse huge files, for example to extract information or to store sequence data in a database, and you don't need to create an object for each sequence but you only want to parse the dataset easily and quickly. Here is the code: https://github.com/fstrozzi/bioruby-faster Here is the wiki for more details: https://github.com/fstrozzi/bioruby-faster/wiki To get the gem: gem install bio-faster Tested with Ruby 1.9 only. Any comment or feedback is much appreciated! Cheers -- Francesco From bonnal at ingm.org Wed Jan 4 15:05:00 2012 From: bonnal at ingm.org (Raoul Bonnal) Date: Wed, 04 Jan 2012 16:05:00 +0100 Subject: [BioRuby] Bio::Faster plugin In-Reply-To: Message-ID: Hi Francesco, It's very cool! And you can access to the seq object/array also in this way: Bio::Faster.parse(File.join(TEST_DATA,"sample.fastq")) do |id, comments, sequence, quality| puts "#{id} #{comments} #{sequence} #{quality}" end Obviously I like it more than using the raw array :-) I suppose in case of no quality value you get a nil object +1 On 04/01/12 10.50, "Francesco Strozzi" wrote: > Hi guys, > > I have created a BioRuby plugin called bio-faster, that implements a fast > and simple parser for FastA and FastQ files. It's based on the C library > Kseq written by Heng Li (author of Samtools and BWA). Compared to > Bio::FastQ it is actually 4-5 times faster in parsing large FastQ files. > The code will not create a Bio object for each sequence but it will return > a simple array with sequence data and quality values for FastQ (it supports > Sanger/Phred format only). > Bio::Faster could be a good choice when you just need to parse huge files, > for example to extract information or to store sequence data in a database, > and you don't need to create an object for each sequence but you only want > to parse the dataset easily and quickly. > > Here is the code: https://github.com/fstrozzi/bioruby-faster > Here is the wiki for more details: > https://github.com/fstrozzi/bioruby-faster/wiki > To get the gem: gem install bio-faster > > Tested with Ruby 1.9 only. > > Any comment or feedback is much appreciated! > > Cheers From georgkam at gmail.com Wed Jan 4 15:20:42 2012 From: georgkam at gmail.com (George Githinji) Date: Wed, 4 Jan 2012 18:20:42 +0300 Subject: [BioRuby] Bio::Faster plugin In-Reply-To: References: Message-ID: ++1 Sounds cool! On Wed, Jan 4, 2012 at 6:05 PM, Raoul Bonnal wrote: > Hi Francesco, > It's very cool! > > And you can access to the seq object/array also in this way: > Bio::Faster.parse(File.join(TEST_DATA,"sample.fastq")) do |id, comments, > sequence, quality| > ?puts "#{id} #{comments} #{sequence} #{quality}" > end > > Obviously I like it more than using the raw array :-) > I suppose in case of no quality value you get a nil object > > > +1 > > > On 04/01/12 10.50, "Francesco Strozzi" wrote: > >> Hi guys, >> >> I have created a BioRuby plugin called bio-faster, that implements a fast >> and simple parser for FastA and FastQ files. It's based on the C library >> Kseq written by Heng Li (author of Samtools and BWA). Compared to >> Bio::FastQ it is actually 4-5 times faster in parsing large FastQ files. >> The code will not create a Bio object for each sequence but it will return >> a simple array with sequence data and quality values for FastQ (it supports >> Sanger/Phred format only). >> Bio::Faster could be a good choice when you just need to parse huge files, >> for example to extract information or to store sequence data in a database, >> and you don't need to create an object for each sequence but you only want >> to parse the dataset easily and quickly. >> >> Here is the code: https://github.com/fstrozzi/bioruby-faster >> Here is the wiki for more details: >> https://github.com/fstrozzi/bioruby-faster/wiki >> To get the gem: gem install bio-faster >> >> Tested with Ruby 1.9 only. >> >> Any comment or feedback is much appreciated! >> >> Cheers > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ Twitter: http://twitter.com/#!/george_l From mictadlo at gmail.com Mon Jan 16 06:24:28 2012 From: mictadlo at gmail.com (Mic) Date: Mon, 16 Jan 2012 16:24:28 +1000 Subject: [BioRuby] Unique reads Message-ID: Hello, I read in many papers that they made unique reads before the reads were align and later on the SNPs were called. However, I could not find out how they do it. Which tool can be used to do it? Thank you in advance. From mh6 at sanger.ac.uk Mon Jan 16 09:41:36 2012 From: mh6 at sanger.ac.uk (Michael Paulini) Date: Mon, 16 Jan 2012 09:41:36 +0000 Subject: [BioRuby] Unique reads In-Reply-To: References: Message-ID: <4F13F0D0.2020609@sanger.ac.uk> On 16/01/12 06:24, Mic wrote: > Hello, > I read in many papers that they made unique reads before the reads > were align and later on the SNPs were called. However, I could not find out > how they do it. > > Which tool can be used to do it? > > Thank you in advance. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby WU-Blast/AB-Blast had a tool to collapse identical fasta entries called nrdb. But you can write your own by comparing the sequences. M -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From mictadlo at gmail.com Mon Jan 16 13:01:56 2012 From: mictadlo at gmail.com (Mic) Date: Mon, 16 Jan 2012 23:01:56 +1000 Subject: [BioRuby] compare sequences Message-ID: Hello, Is there anyway a memory efficient way to compare sequences like from NGS? Thank you in advance. From p.j.a.cock at googlemail.com Mon Jan 16 13:30:01 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 16 Jan 2012 13:30:01 +0000 Subject: [BioRuby] compare sequences In-Reply-To: References: Message-ID: On Mon, Jan 16, 2012 at 1:01 PM, Mic wrote: > Hello, > Is there anyway a?memory?efficient?way to ?compare ?sequences like from NGS? > > Thank you in advance. Hi Mic, Could you stop posting such broad questions to multiple mailing lists simultaneously please? Perhaps you would find Biostars Q&A more useful? http://biostar.stackexchange.com/ See also: http://dx.doi.org/10.1371/journal.pcbi.1002202 Peter From donttrustben at gmail.com Thu Jan 19 07:58:01 2012 From: donttrustben at gmail.com (Ben Woodcroft) Date: Thu, 19 Jan 2012 17:58:01 +1000 Subject: [BioRuby] What is the bar for releasing biogems? Message-ID: Hi there, I was hoping for some advice from the list about policy on releasing biogems. I have a few (2 or 3) biogems on my computer which: * Solve a discrete problem (in my case, wrapping around an underlying bioiformatic program and parsing the result) * At least a little bit unit tested * Are bioinformatics-related However, they are also: * Not fantastic leaps forward - they don't solve big problems * Probably limited to a small audience, since the programs themselves would be of little use outside the (not large) field (bioinformatics of protein sub-cellular localisation in apicomplexan parasites). I find having a biogem is a convenient mechanism. But I don't want to release code that is of no use to anyone else (or any more of it..), particularly as I cannot know whether I'll continue to use the code once my PhD is done. Should I release the gems? Thanks, ben -- Ben J Woodcroft, BE (Hons) PhD Candidate Ralph Laboratory The University of Melbourne Melbourne, Australia tel: (+613) 8344 2319 pgrad.unimelb.edu.au after b.woodcroft From pjotr.public14 at thebird.nl Thu Jan 19 08:23:36 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 19 Jan 2012 09:23:36 +0100 Subject: [BioRuby] What is the bar for releasing biogems? In-Reply-To: References: Message-ID: <20120119082336.GB17283@thebird.nl> Hi Ben, On Thu, Jan 19, 2012 at 05:58:01PM +1000, Ben Woodcroft wrote: > Hi there, > > I was hoping for some advice from the list about policy on releasing > biogems. I have a few (2 or 3) biogems on my computer which: > * Solve a discrete problem (in my case, wrapping around an underlying > bioiformatic program and parsing the result) > * At least a little bit unit tested > * Are bioinformatics-related > > However, they are also: > * Not fantastic leaps forward - they don't solve big problems > * Probably limited to a small audience, since the programs themselves would > be of little use outside the (not large) field (bioinformatics of protein > sub-cellular localisation in apicomplexan parasites). > > I find having a biogem is a convenient mechanism. But I don't want to > release code that is of no use to anyone else (or any more of it..), > particularly as I cannot know whether I'll continue to use the code once my > PhD is done. Should I release the gems? Yes. Please post them, Do not worry about interest, take up, or quality. That is up to the people who ultimately take an interest in your gems. If there are issues they may approach you, or become maintainers themselves. With the growth of number of gems we will find ways to handle presentation and quality issues. First on our list is an automated testing frame work - which will show on the site the gems that pass their tests. Next we will create a subsection for 'development' and/or 'unstable' gems. That way normal users can feel safe in using the tested and stable gems. The meta packages (bio-core etc) already have an implicit policy it that way. Anyone should be able to install bio-core gems. In other words, don't worry. Release early and often. That is the OSS adagio, that is what http://biogems.info/ is about. Pj. From pjotr.public14 at thebird.nl Thu Jan 19 08:38:41 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 19 Jan 2012 09:38:41 +0100 Subject: [BioRuby] What is the bar for releasing biogems? In-Reply-To: <20120119082336.GB17283@thebird.nl> References: <20120119082336.GB17283@thebird.nl> Message-ID: <20120119083841.GA19135@thebird.nl> I added the following text to http://www.biogems.info/howto.html Biogem aims to encourage open source software development, and provide tools to support bazaar programming. In our interpretation: * Every good work of software starts by scratching a developer's personal itch * Release source code early and often * With enough users, almost every problem is quickly known, and a fix obvious * Many heads are inevitably better than one * Good programmers know what to write. Great ones know what to rewrite (and reuse) From bonnal at ingm.org Thu Jan 19 10:40:58 2012 From: bonnal at ingm.org (Raoul Bonnal) Date: Thu, 19 Jan 2012 11:40:58 +0100 Subject: [BioRuby] What is the bar for releasing biogems? In-Reply-To: <20120119082336.GB17283@thebird.nl> Message-ID: Hi Ben, I agree with Pjotr, post and publish the gems. >From my experience is better to publish/release on rubygems when the gems is enough stable to be used by others otherwise you will spend a lot of time on fixing their issues. The gem doesn't need a huge documentation, short and clear is better, some comment in the code and that's it. Find an effective description for your gem, every time I look for gems I read/search the short description. Another rule I'm trying to follow is to define/reuse a namespace which is pluggable into BioRuby -but this is my personal point of view-. Don't forget to drop an email here :-) We need to restart IRC meetings... On 19/01/12 09.23, "Pjotr Prins" wrote: > Hi Ben, > > On Thu, Jan 19, 2012 at 05:58:01PM +1000, Ben Woodcroft wrote: >> Hi there, >> >> I was hoping for some advice from the list about policy on releasing >> biogems. I have a few (2 or 3) biogems on my computer which: >> * Solve a discrete problem (in my case, wrapping around an underlying >> bioiformatic program and parsing the result) >> * At least a little bit unit tested >> * Are bioinformatics-related >> >> However, they are also: >> * Not fantastic leaps forward - they don't solve big problems >> * Probably limited to a small audience, since the programs themselves would >> be of little use outside the (not large) field (bioinformatics of protein >> sub-cellular localisation in apicomplexan parasites). >> >> I find having a biogem is a convenient mechanism. But I don't want to >> release code that is of no use to anyone else (or any more of it..), >> particularly as I cannot know whether I'll continue to use the code once my >> PhD is done. Should I release the gems? > > Yes. Please post them, > > Do not worry about interest, take up, or quality. That is up to the > people who ultimately take an interest in your gems. If there are > issues they may approach you, or become maintainers themselves. > > With the growth of number of gems we will find ways to handle > presentation and quality issues. First on our list is an automated > testing frame work - which will show on the site the gems that pass > their tests. Next we will create a subsection for 'development' > and/or 'unstable' gems. That way normal users can feel safe in using > the tested and stable gems. The meta packages (bio-core etc) already > have an implicit policy it that way. Anyone should be able to install > bio-core gems. > > In other words, don't worry. Release early and often. That is the OSS > adagio, that is what http://biogems.info/ is about. > > Pj. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From p.j.a.cock at googlemail.com Fri Jan 20 10:46:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 20 Jan 2012 10:46:18 +0000 Subject: [BioRuby] NCBI adoption of AGP v2.0 and new qualifiers in GenBank/EMBL Message-ID: Dear all, I just spotted this via the @NCBI twitter feed, http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_spec_change.shtml In addition to the NCBI switch from AGP v1.1 to v2.0, the INSDC have recently added a new feature type called "assembly_gap", and the associated qualifiers "gap_type" and "linkage_evidence" to the INSDC Feature Table Definitons. Quoting from version 10.0, dated Dec 2011 http://www.insdc.org/documents/feature_table.html#7.2 > Feature Key assembly_gap > > > Definition gap between two components of a CON record that is > part of a genome assembly; > > Mandatory qualifiers /estimated_length=unknown or > /gap_type="TYPE" > /linkage_evidence="TYPE" (Note: Mandatory only if the > /gap_type is "within scaffold" or "repeat within > scaffold".If there are multiple types of linkage_evidence > they will appear as multiple /linkage_evidence="TYPE" > qualifiers. For all other types of assembly_gap > features, use of the /linkage_evidence qualifier is > invalid.) > > Comment the location span of the assembly_gap feature for an > unknown gap is 100 bp, with the 100 bp indicated as > 100 "n"'s in sequence. > i.e. DDBJ, ENA & GenBank flat-files will start to use the "assembly_gap" features to display information derived from version 2.0 AGP files from 10th Feb 2012. Probably this will affect the XML variants as well. Unless any of the parsers/writers for GenBank or EMBL flat files use a white list approach, the new feature key and qualifiers shouldn't cause a problem. Peter From pjotr.public14 at thebird.nl Sun Jan 22 10:53:19 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sun, 22 Jan 2012 11:53:19 +0100 Subject: [BioRuby] Generating your own bioinformatics code for plugins using templates (DRY) Message-ID: <20120122105319.GA3001@thebird.nl> I have been working on a document to discuss ways of modifying Biogem, so you can generate your own code, and avoid repetitious work. The design of the biogem code generator is based on templates, and there are accessible ways to hack it by adding your own templates. Don't repeat yourself (DRY)! https://github.com/pjotrp/bioruby-gem/blob/master/doc/biogem-hacking.md Please comment. Pj. (note a few links are broken because they point to Raouls tree, which needs to merge my changes) From bonnal at ingm.org Sun Jan 22 15:03:57 2012 From: bonnal at ingm.org (Raoul Bonnal) Date: Sun, 22 Jan 2012 16:03:57 +0100 Subject: [BioRuby] Generating your own bioinformatics code for plugins using templates (DRY) In-Reply-To: <20120122105319.GA3001@thebird.nl> Message-ID: <20120122150357.d70ef3b0@mail.ingm.it> Hi Pjotr, well done. Pull merged. There are new feature I'm going to implement for the future release like, updating the project with new options like adding binary or a database if you did select the option at the beginning. _____ From: Pjotr Prins [mailto:pjotr.public14 at thebird.nl] To: BioRuby Mailing List [mailto:bioruby at lists.open-bio.org] Sent: Sun, 22 Jan 2012 11:53:19 +0100 Subject: [BioRuby] Generating your own bioinformatics code for plugins using templates (DRY) I have been working on a document to discuss ways of modifying Biogem, so you can generate your own code, and avoid repetitious work. The design of the biogem code generator is based on templates, and there are accessible ways to hack it by adding your own templates. Don't repeat yourself (DRY)! https://github.com/pjotrp/bioruby-gem/blob/master/doc/biogem-hacking.md Please comment. Pj. (note a few links are broken because they point to Raouls tree, which needs to merge my changes) _______________________________________________ BioRuby Project - http://www.bioruby.org/ BioRuby mailing list BioRuby at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Tue Jan 24 10:50:14 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Tue, 24 Jan 2012 11:50:14 +0100 Subject: [BioRuby] Ruby documentation with syntax highlight (Wiki no more) Message-ID: <20120124105014.GA22535@thebird.nl> Github has syntax highlighting for markdown. This has potential for all Ruby documents! We can maintain docs in git, and they get displayed on github. http://github.github.com/github-flavored-markdown/ Example https://github.com/pjotrp/bioruby-gem/blob/master/doc/biogem-hacking.md No more wiki documentation, as far as I am concerned. Pj. From pjotr.public14 at thebird.nl Thu Jan 26 10:34:23 2012 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 26 Jan 2012 11:34:23 +0100 Subject: [BioRuby] EU-codefest in July Message-ID: <20120126103423.GA10234@thebird.nl> EU-codefest 2012 will be 19 and 20 July in Lodi Italy In coordination with the BOSC committee we are organising the first EU-Codefest, a low-key event the week after BOSC 2012. http://www.open-bio.org/wiki/EU_Codefest_2012