[Bioperl-l] Can't figure out restriction analysis

Maxim deeepersound at googlemail.com
Fri Jan 7 21:46:15 UTC 2011


I'm desperately trying to annotate all HindIII restriction sites for
different genomes. The only solution I was able to figure out (after
complicated self-made approaches using awk and grep) is shown below. I'm not
a programmer, so please excuse the bad coding (strict, warnings etc, this is
dedicated to be a "standalone" script). Below code reads the 25 fasta files
for human genome one after another. Then restriction analysis is performed,
the only value I was able to get returned so far is the fragment length. Am
I right, that the fragment length values returned by the @fragments array
can be mapped onto the reference genome in a linear fashion, i.e. the
fragment length values are ordered according to their location on the
reference sequence? If so, I can of course get the positions (coordinates)
by simply adding the fragment length values.

More questions:
I've seen the _make_cuts function, would this be more appropriate in order
to directly retrieve coordinates of cutting positions? Unfortunately I can't
get it work like
my $analysis = _make_cuts($seq->seq,'HindIII',0)
this returns:can't call method "_make_cuts" on an undefined value

Or are both approaches just nonsense and I overlooked another obvious

use Bio::Perl;
use Bio::SeqIO;
use Bio::Restriction::Analysis;
use Bio::Restriction::EnzymeCollection;
use Cwd;

$dir = getcwd;

$fasta_dir = "/data1/Genomes/HG18/";
opendir(DIR, $fasta_dir) or die "$!";
@chroms =  grep {/chr/}  readdir DIR;
print "found the following files:", join ("\n",  map {$_} @chroms), "\n";
#foreach $chrom (@chroms) {print "$chrom\n";}

foreach $file (@chroms)
$outname = $file;
@outname = split (/.fa/, $outname);
$outname = $dir . "/" . @outname[0] . "_fragments";
print "Output filename: $outname\n";
open (OUT, ">$outname");
$filelink = $fasta_dir. "/" . $file;
my $seqio = Bio::SeqIO->new(-file => $filelink, '-format' => 'Fasta');
while(my $seq = $seqio->next_seq)
    my $string = $seq->seq;
    #print $string,"\n";

    my $analysis = Bio::Restriction::Analysis->new(-seq => $seq);
    my  @fragments =  $analysis->fragments('HindIII');

    print OUT join("\n", map {length $_} @fragments), "\n";
close (OUT);
print "chromosome file: $file is done!\n";

More information about the Bioperl-l mailing list