[Bioperl-l] Simulate fragment and read-pair reads from sample sequence

Mon Feb 15 17:25:15 UTC 2010

Hi Albert,

There are some tools out there that already do a good job of this, and
take into consideration platform-specific error models, paired end
fragment length distributions, etc.  maq has an Illumina read
simulator that also simulates quality scores, and MetaSim has a nice
454 read simulator (and Illumina as well, though unfortunately doesn't
do quality scores for either).

But to answer your question, you'd first select a random fragment of
length $insert_length, and then take the two reads from the left and
right ends of the fragment (reading from 5' and 3'/revcomp
respectively).

Note that in a real-world DNA capture experiment targeting 10 kb,
you'd get some fragments that extend past the boundaries.

-Aaron

On Mon, Feb 15, 2010 at 8:45 AM, Albert Vilella <avilella at gmail.com> wrote:
> Hi,
>
> This may have been asked before but I couldn't find any similar
> question by searching the archives.
> I want to simulate substr's from a sequence of a given (read) length.
>
> If I have a starting sequence of 10,000bp, the way I understand it is
> that a standard Solexa (or 454) library
> preparation will give me random $read_length=75 (or 450) reads along
> the 10k sequence.
> So I want to do a random substr on the sequence itself.
>
> But if I am simulating read-pairs of a 10,000bp sequence, given a
> defined $insert_size length, how would I
> do the random substr pairs?
>
> Cheers,
>
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>