[Bioperl-l] Make edits to a large sequence

Tue Jun 28 15:23:29 UTC 2011

An array might work, or just hold the whole thing in a string and use
substr on that rather than the BioPerl objects.

while (my $seq = $in->next_seq()) {
	my sequence = $seq->seq;
	substr($sequence,2,1,'c');
	substr($sequence,8,1,'t');
	$seq->seq($sequence);
	...
}

Just remember that substr works on a 0 indexed string rather than a 1
indexed. So the 3rd position is 2 rather than 3.

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of wannymahoots
> Sent: Tuesday, June 28, 2011 5:46 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Make edits to a large sequence
> 
> Hi,
> 
> I'm looking for the quickest / most efficient way to make many edits
> (mutations) to a long fasta sequence using bioperl.  The sequences are
> of the order of 200Mb long, and I would like to make 1,000s of changes
> to single bases (e.g. A->T at position 1,000, G->C at position 1,201
> etc.).  The only way I've come across to do this is reading in the
> sequence and then making edits using SeqUtils, so something like:
> 
> my $in = Bio::SeqIO->new('-file' => "file.fa", '-format' => "fasta");
> 
> while(my $seq = $in->next_seq()) {
>         my $mut = Bio::LiveSeq::Mutation->new(-seq => 'c',-pos => 3);
>         Bio::SeqUtils->mutate($seq,$mut);
> }
> 
> However, I'm concerned that this might be making multiple copies of
> the large sequence, and that using substr (which is how mutate works),
> is perhaps not the most efficient.  Would it be better to save the
> fasta sequence as an array and change individual array positions
> directly?
> 
> Many thanks for any advice.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l