[Bioperl-l] Bioperl and ACE files

Wes Barris wes.barris at csiro.au
Mon Feb 16 17:42:54 EST 2004


Jason Stajich wrote:

> As always, more code and information as to how you got here makes it
> easier for someone to answer.

I have attached the perl script that I am using.
The sample ACE file is available here:

	http://www.livestockgenomics.csiro.au/junk.ace

You might run this script like this:

acetest.pl junk.ace outdir

> 
> Not really sure how you are getting to the point where you have created
> Bio::LocatableSeq objects - presumably you are trying to do an assembly
> so I'll guess you got there from Bio::Assembly::IO.
> 
> You may need to get help from Robson about what the format is supposed to
> support.  A start of 0 is not really proper in Bioperl -
> sequences/features start at 1 in our system, so the assembly code needs to
> adjust for that.  presumably those numbers are offsets not actual start
> positions so the parsing code may need some looking at.
> 
> -jason
> 
> 
> On Mon, 16 Feb 2004, Wes Barris wrote:
> 
>  > Hi,
>  >
>  > I have an ACE file that I am trying to process with bioperl.  A portion
>  > of the ACE file looks like this:
>  >
>  > AF CB429506 U 2
>  > AF CB428704 U 6
>  > AF CB430643 U 1
>  > AF CB431187 U 0
>  > AF CB430639 U -7
>  > AF CB430480 C 24
>  > AF CB430055 U 10
>  >
>  > Notice the line in the middle that shows a starting position of '0'
>  > (zero)?  When bioperl tries to process this sequence, an error is
>  > thrown.  I have found the port of the bioperl code that throws the
>  > error:
>  > Bio/LocatableSeq.pm:
>  > sub get_nse{
>  >     my ($self,$char1,$char2) = @_;
>  >
>  >     $char1 ||= "/";
>  >     $char2 ||= "-";
>  >
>  >     $self->throw("Attribute id not set") unless $self->id();
>  >     $self->throw("Attribute start not set") unless $self->start();<-----
>  >     $self->throw("Attribute end not set") unless $self->end();
>  >
>  >     return $self->id() . $char1 . $self->start . $char2 . $self->end ;
>  >
>  > }
>  >
>  > Notice how "$self->start()" is tested.  When it encounters a sequence
>  > whose start is set to zero, an error is thrown.
>  >
>  > I don't know much about the ACE file format.  Do I have a questionable
>  > ACE file or is this test incomplete?
>  >
> 
> -- 
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> 


-- 
Wes Barris
E-Mail: Wes.Barris at csiro.au
-------------- next part --------------
#!/usr/local/bin/perl -w
#
#
use strict;
use Bio::Assembly::IO;
use Bio::AlignIO;
use Bio::SeqIO;
#
my $usage = "Usage: $0 <infile.ace> <outdir>\n";
my $infile = shift or die $usage;
my $outdir = shift or die $usage;
my $prefix = 'cn';
my $ext = 'msf';
mkdir $outdir, 0755 if (! -d $outdir);
my $io = new Bio::Assembly::IO(-file=>$infile, -format=>'ace');
my $assembly = $io->next_assembly;	# Bio::Assembly::ScaffoldI

foreach my $contig ($assembly->all_contigs()) {	# Bio::Assembly::Contig
   my $contigName = $prefix.($contig->id);
#
# Write the consensus to a file.
#
   my $consensusSeq = new Bio::Seq(
			-seq=>$contig->get_consensus_sequence->seq,
			-id=>$contigName);
   my $seqout = new Bio::SeqIO(-file=>">$outdir/$contigName.fa", -format=>'fasta');
   $seqout->write_seq($consensusSeq);
#
# Make the consensus the first sequence of the simple align object.
#
   my $aln = new Bio::SimpleAlign();
   $aln->id('alignment.msf');
   $contig->get_consensus_sequence->id($contigName);
   $aln->add_seq($contig->get_consensus_sequence);
#
# Loop through each sequence in the contig adding it to the new alignment.
#
   foreach my $seq ($contig->each_seq) {
      my $id;
      if ($seq->display_id =~ /\|/) {
         my @junk = split(/[\|\.]/, $seq->display_id);
         $id = $junk[3];
         }
      else {
         $id = $seq->display_id;
         }
      my $lseq = new Bio::LocatableSeq(
				-seq=>$seq->seq,
				-id=>$id,
				-start=>$contig->get_seq_coord($seq)->start,
				-end=>$contig->get_seq_coord($seq)->end,
				);
      &alignSeq($lseq,$contig->get_consensus_sequence->length);
      $aln->add_seq($lseq);
      }
   $aln->set_displayname_flat;
   my $outstream = new Bio::AlignIO(-format=>'msf', -file=>">$outdir/$contigName.$ext");
   $outstream->write_aln($aln);
   undef $outstream;
   }


exit;

sub alignSeq {
   my ($lseq, $cnlength) = @_;
#
# Clip any sequence that begins before the consensus.
#
   if ($lseq->start <= 0) {
      my $offset = -$lseq->start + 2;
      $lseq->seq($lseq->subseq($offset,$lseq->length));
      print($lseq->display_id," was clipped at the beginning by $offset\n");
      }
#
# Pad each sequence so it aligns with the consensus.
#
   my $before = $lseq->start - 1;
   my $after = $cnlength - $lseq->end;
   my $alignedSequence = '-' x $before . $lseq->seq . '-' x $after;
#
# Trim any sequence that extends beyond the consensus.
#
   if (length($alignedSequence) > $cnlength) {
      $alignedSequence = substr($alignedSequence, 0 ,$cnlength);
      print($lseq->display_id," was clipped at the end\n");
      }
   $lseq->seq($alignedSequence);

   return;
   }


More information about the Bioperl-l mailing list