[Bioperl-l] parsing result of CAP3 (ACE file)

Jean-Marc FRIGERIO Frigerio at pierroton.inra.fr
Tue Sep 9 08:45:19 UTC 2008


> > -- Hi,
> >
> > Is somebody have a piece of code to parse result of CAP3 assembly program
> > which
> > format is ACE ?
> > I need to retrieve the alignment from this file.
> >
> > thank you,
> > Laurent --
> >
> >
> >
> >
> > +---------------------------------------------+
> >  Laurent Manchon
> >  Email: lmanchon at univ-montp2.fr
> > +---------------------------------------------+
>
> Laurent -
>
> I have modified modules that will do it as I recently ran into problems
> with the DB_FILE module in Assembly::IO.  In addition, the current version
> of cap3 seems to put a contig length where a pad length is expected (based
> on the Ace format description).  The modules I have will parse the ace file
> contig-by-contig rather than having the entire assembly slurped into memory
> (or a tied hash) all at once.  You are welcome to them if you are
> interested and I'd like to get them in Bioperl at some point.  Bascially,
> there are three files - a modified Contig.pm, ContigIO.pm, and a modified
> ace.pm (in a ContigIO directory).
>
> Josh

Hi,
Here are a 2 pieces of code running on an ace file (output of phrap is that 
the same as cap3 ?)
----------------------------- 1 -----------------------------------------
my $assembly = Bio::Assembly::IO->new('-file'    => $file,
                                 '-format'  => 'ace')->next_assembly;

for my $contig ($assembly->all_contigs)
{
    my $ct_seq  = $contig->get_consensus_sequence;
    (my $ref_seq = uc $ct_seq->seq) =~ s/-//g;

    my $debut  = $pos - 100 > 0 ? $pos - 100 : 1;
    my $fin    = $pos + 100 <= length $ref_seq ?
                 $pos + 100 :  length $ref_seq;

    my $coll = $contig->get_features_collection;
    my @coll = $coll->features_in_range('-start' => $debut, '-end' => $fin);
    for my $tag (@coll)
    {
        next unless  $tag->primary_tag eq 'comment';
#print "TAG: ",$tag->start,"\n";
        my $tag_pos = $contig->change_coord('gapped consensus','ungapped 
consensus',$tag->start);
#print "TAG POS: $tag_pos\n";
        next if $pos == $tag_pos;

        substr($ref_seq,$tag_pos-1,1,'N');

    }
}



------------------------------------ 2 -------------------
 my $assembly = Bio::Assembly::IO->new(
    '-file'    => $file,
    '-format'  => 'ace')->next_assembly;
  for my $contig ($assembly->all_contigs)
  {
    for my $seq ($contig->each_seq)
    {
      my $id = $seq->id;
      my $s = $seq->seq;
      my ($start,$end) =
        ($contig->change_coord("aligned $id","ungapped consensus",
$seq->start),
         $contig->change_coord("aligned $id","ungapped consensus",$seq->end));
      my $dir = $seq->strand < 0 ? 'R' : 'F';
 ......
}
	-- Jean-Marc



More information about the Bioperl-l mailing list