[Bioperl-l] gap characters in SimpleAlign objects

Nathan Haigh nathanhaigh at ukonline.co.uk
Wed Feb 18 10:44:43 EST 2004


OK, I think I've figured out where my confusion lays:
I thought that the default output format from clustalw would be clustalw
format, but as it turns out it's gcg (MSF) which has '.' as it's gap
characters.
Ok, I've think I've figured out the problem (well at least part of it!):
The are a couple of lines in clustalw.pm that retrieve the alignment that
was generated by clustalw, but I think this may not have been updated
since the addition of more alignment formats to AlignIO. As a result it
defaults to MSF unless you have specified 'phylip' as the output format in
the alignment factory parameters @params.
As a result I have replaced the following lines:

	my $format= $output =~/phylip/i ? "phylip" : "MSF";
	my $in  = Bio::AlignIO->new(-file => $outfile, '-format' =>
$format);
with
	$self->output('MSF') if !$self->output();
	my $in  = Bio::AlignIO->new(-file => $outfile, '-format' =>
$self->output());

This leaves the default file format as MSF (although I think clustalw
would be a more obvious choice) but allows the user to specify any of the
other supported formats.

I will then use $aln->map_chars('\.','-') to change the gap characters
around.

The problem with this is that if you do not specify an output format, the
default MSF is used (which uses '.' as gaps) and then when you create an
output alignment stream in fasta format you get '.' as gaps (I'm pretty sure
fasta format requires '-' as the gap symbol). Therefore, would it not be
safer to check for the correct gap symbol in the fasta AlignIO module?


Thanks
Nathan


> -----Original Message-----
> From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu]
> Sent: 18 February 2004 14:15
> To: Ewan Birney
> Cc: Nathan Haigh; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] gap characters in SimpleAlign objects
>
> It's easier than this - not sure why gaps are becoming '.' but I had to
> work around this in other places as well Coordinate::Pair.
>  $aln->map_chars('\.','-')
>
> --jason
>
> On Wed, 18 Feb 2004, Ewan Birney wrote:
>
> > On Wed, 18 Feb 2004, Nathan Haigh wrote:
> >
> > > I've been using the clustalw module for creating alignment, and I've
just
> > > realised that when you output the alignment the gap character is a
"." not a
> > > "-".
> > > This is most annoying because I am adding support to this module for
> > > generating trees via clustalw, and clustalw removes these "."
characters. Is
> > > there a method for changing these gap characters to "-". I have seen
the
> > > gap_char method in the SimpleAlign module, but this seems only to
designate
> > > a particular character as a gap character, and does not actually
change the
> > > character.
> > >
> > > Any ideas on how to do this substitution, and where in BioPerl does
this
> > > assignment get made in the first place, since the default gap char
for
> > > clustalw output is "-" not "."
> >
> > To fix (short term): Loop over the sequences making a new SimpleAlign
> > object with LocatableSeqs and s/\./-/ on the seq strings
> >
> >
> >
> > How are you reading in Clustalw alignments? The Bio::AlignIO::clustalw
> > doesn't touch the gap characters:
> >
> >
> >     foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
%alignments
> > ) {
> >         if( $name =~ /(\S+):(\d+)-(\d+)/ ) {
> >             ($sname,$start,$end) = ($1,$2,$3);
> >         } else {
> >             ($sname, $start) = ($name,1);
> >             my $str  = $alignments{$name};
> >             $str =~ s/[^A-Za-z]//g;
> >             $end = length($str);
> >         }
> >         my $seq = new Bio::LocatableSeq('-seq'   =>
$alignments{$name},
> >                                          '-id'    => $sname,
> >                                          '-start' => $start,
> >                                          '-end'   => $end);
> >
> >
> >
> > ($alignments{$name} has no regex put on it earlier either)
> >
> >
> >
> > >
> > > Thanks
> > > Nathan
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > -----------------------------------------------------------------
> > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> > <birney at ebi.ac.uk>.
> > -----------------------------------------------------------------
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu




More information about the Bioperl-l mailing list