[BioRuby] Fastq.to_s
Tomoaki NISHIYAMA
tomoakin at kenroku.kanazawa-u.ac.jp
Mon Aug 29 01:04:54 UTC 2011
Hi Goto-san,
Thank you, incorporating the fastq.to_s and releasing 1.4.2!
With 1.4.2, the following two programs run about the same time;
56m38s vs 56m24s user time for a pair of 22-GB fastq files (single measurement).
So, we perhaps can simply choose one easier to read, write, or explain;-)
flatmerge
#!/bin/env ruby
require 'bio'
ff1 = Bio::FlatFile.open(nil, ARGV[0])
ff2 = Bio::FlatFile.open(nil, ARGV[1])
ff1.each_entry do |fe1|
fe2 = ff2.next_entry
puts fe1.to_s
puts fe2.to_s
end
flatmerge2
#!/bin/env ruby
require 'bio'
ff1 = Bio::FlatFile.open(nil, ARGV[0])
ff2 = Bio::FlatFile.open(nil, ARGV[1])
ff1.each_entry do |fe1|
print ff1.entry_raw
fe2 = ff2.next_entry
print ff2.entry_raw
end
--
Tomoaki NISHIYAMA
Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan
On 2011/08/22, at 15:09, Naohisa Goto wrote:
> Hi,
>
> In this case, Bio::FlatFile#entry_raw, which returns the last
> entry's string in the flat-file object, is recommended, from
> the viewpoint of performance (not to create additional objects).
>
> modified example:
> require 'bio'
>
> ff1 = Bio::FlatFile.open(nil, ARGV[0])
> ff2 = Bio::FlatFile.open(nil, ARGV[1])
>
> ff1.each_entry do |fe1|
> fe1_raw = ff1.entry_raw
> fe2 = ff2.next_entry
> fe2_raw = ff2.entry_raw
> print fe1_raw
> print fe2
> end
>
> Note that the example will not correctly work when the
> two files contain different number of sequences.
>
> I also agree Fastq#to_s as a convenience method
> regardless of performance.
>
> --
> Naohisa Goto
> ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
>
>> Hi,
>>
>> For flatfiles I think its nice if we can output the original text entries as split.
>> For example
>>
>> #!/bin/env ruby
>>
>> require 'bio'
>>
>> ff1 = Bio::FlatFile.open(nil, ARGV[0])
>> ff2 = Bio::FlatFile.open(nil, ARGV[1])
>>
>> ff1.each_entry do |fe1|
>> fe2 = ff2.next_entry
>> puts fe1
>> puts fe2
>> end
>>
>> should be able to merge read1 and read2 in different file to a single file.
>> This does work with fasta format but not with fastq format right now, because
>> Bio::Fastq does not have to_s method. As Fastq does not hold really original
>> data, reconstructing as the following patch is perhaps a good way (don't use
>> twice memory just for the to_s function). Or, do we need to fold the sequence
>> to some (original or fixed) length?
>>
>> diff --git a/lib/bio/db/fastq.rb b/lib/bio/db/fastq.rb
>> index f913e6d..5ff1a15 100644
>> --- a/lib/bio/db/fastq.rb
>> +++ b/lib/bio/db/fastq.rb
>> @@ -407,6 +407,10 @@ class Fastq
>> # raw sequence data as a String object
>> attr_reader :sequence_string
>>
>> + def to_s
>> + "@#{@definition}\n#{@sequence_string}\n+#{@definition2}\n#{@quality_string}\n"
>> + end
>> +
>> # returns Bio::Sequence::NA
>> def naseq
>> unless defined? @naseq then
>>
>> Best regards,
>> --
>> Tomoaki NISHIYAMA
>>
>> Advanced Science Research Center,
>> Kanazawa University,
>> 13-1 Takara-machi,
>> Kanazawa, 920-0934, Japan
>>
>
>
>
More information about the BioRuby
mailing list