[BioRuby] Fastq.to_s

Tomoaki NISHIYAMA tomoakin at kenroku.kanazawa-u.ac.jp
Mon Aug 29 01:04:54 UTC 2011


Hi Goto-san,

Thank you, incorporating the fastq.to_s and releasing 1.4.2!

With 1.4.2, the following two programs run about the same time;
56m38s vs 56m24s user time for a pair of 22-GB fastq files (single measurement).
So, we perhaps can simply choose one easier to read, write, or explain;-)

flatmerge
#!/bin/env ruby
require 'bio'

ff1 = Bio::FlatFile.open(nil, ARGV[0])
ff2 = Bio::FlatFile.open(nil, ARGV[1])

ff1.each_entry do |fe1|
  fe2 = ff2.next_entry
  puts fe1.to_s
  puts fe2.to_s
end

flatmerge2
#!/bin/env ruby
require 'bio'

ff1 = Bio::FlatFile.open(nil, ARGV[0])
ff2 = Bio::FlatFile.open(nil, ARGV[1])

ff1.each_entry do |fe1|
 print ff1.entry_raw
 fe2 = ff2.next_entry
 print ff2.entry_raw
end

-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi, 
Kanazawa, 920-0934, Japan


On 2011/08/22, at 15:09, Naohisa Goto wrote:

> Hi,
> 
> In this case, Bio::FlatFile#entry_raw, which returns the last
> entry's string in the flat-file object,  is recommended, from
> the viewpoint of performance (not to create additional objects).
> 
> modified example:
>  require 'bio'
> 
>  ff1 = Bio::FlatFile.open(nil, ARGV[0])
>  ff2 = Bio::FlatFile.open(nil, ARGV[1])
> 
>  ff1.each_entry do |fe1|
>    fe1_raw = ff1.entry_raw
>    fe2 = ff2.next_entry
>    fe2_raw = ff2.entry_raw
>   print fe1_raw
>   print fe2
> end
> 
> Note that the example will not correctly work when the
> two files contain different number of sequences.
> 
> I also agree Fastq#to_s  as a convenience method
> regardless of performance.
> 
> -- 
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
> 
>> Hi,
>> 
>> For flatfiles I think its nice if we can output the original text entries as split.
>> For example 
>> 
>> #!/bin/env ruby
>> 
>> require 'bio'
>> 
>> ff1 = Bio::FlatFile.open(nil, ARGV[0])
>> ff2 = Bio::FlatFile.open(nil, ARGV[1])
>> 
>> ff1.each_entry do |fe1|
>>  fe2 = ff2.next_entry
>>  puts fe1
>>  puts fe2
>> end
>> 
>> should be able to merge read1 and read2 in different file to a single file.
>> This does work with fasta format but not with fastq format right now, because
>> Bio::Fastq does not have to_s method.  As Fastq does not hold really original 
>> data, reconstructing as the following patch is perhaps a good way (don't use
>> twice memory just for the to_s function). Or, do we need to fold the sequence
>> to some (original or fixed) length?
>> 
>> diff --git a/lib/bio/db/fastq.rb b/lib/bio/db/fastq.rb
>> index f913e6d..5ff1a15 100644
>> --- a/lib/bio/db/fastq.rb
>> +++ b/lib/bio/db/fastq.rb
>> @@ -407,6 +407,10 @@ class Fastq
>>   # raw sequence data as a String object
>>   attr_reader :sequence_string
>> 
>> +  def to_s
>> +    "@#{@definition}\n#{@sequence_string}\n+#{@definition2}\n#{@quality_string}\n"
>> +  end
>> +
>>   # returns Bio::Sequence::NA
>>   def naseq
>>     unless defined? @naseq then
>> 
>> Best regards,
>> -- 
>> Tomoaki NISHIYAMA
>> 
>> Advanced Science Research Center,
>> Kanazawa University,
>> 13-1 Takara-machi, 
>> Kanazawa, 920-0934, Japan
>> 
> 
> 
> 





More information about the BioRuby mailing list