[Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting]

Wed Feb 16 08:01:31 EST 2011

On Tue, 15 Feb 2011 14:25:07 -0600, Chris wrote:

> SHA should work as well, didn't think of that (though I suppose the
> encoding step for either would be rate-limiting?).

Disk I/O might be the bottleneck - on a 3+ year old desktop I get ~144
MB/s for sha1 and ~217 MB/s for md5 in a simple test:

  $ dd if=/dev/zero bs=1M count=1024 | sha1sum -
  1024+0 records in
  1024+0 records out
  1073741824 bytes (1.1 GB) copied, 7.44032 s, 144 MB/s
  2a492f15396a6768bcbca016993f4b4c8b0b5307  -
  $ dd if=/dev/zero bs=1M count=1024 | md5sum -
  1024+0 records in
  1024+0 records out
  1073741824 bytes (1.1 GB) copied, 4.94205 s, 217 MB/s
  cd573cfaace07e7949bc0c46028904ff  -

On a reasonably new standard Dell desktop I get ~249 MB/s and ~410 MB/s
respectively.

  Best regards,

    Adam

-- 
                                                          Adam Sjøgren
                                                    adsj at novozymes.com