[Bioperl-l] Bio::AlignIO ignores questionmarks?

David Messina dmessina at wustl.edu
Fri Apr 14 05:14:25 UTC 2006


Hi Kai,

I'm by no means an expert with this module, but I'll take a shot.

Running your code through a debugger, I'm seeing that  
Bio::AlignIO::fasta is gobbling the question marks:

line 66: $MATCHPATTERN = '^A-Za-z\.\-';

and then where $entry contains a line of sequence from the input file

line 118: $entry =~ s/[$MATCHPATTERN]//g;

As far as I can tell, a question mark is not a valid character for  
the FASTA format (see http://en.wikipedia.org/wiki/FASTA_format) --  
perhaps that's the reason Bio::AlignIO::fasta doesn't permit them?

And then by the time missing_char() is applied, the question marks  
are already gone.

What happens if you read in your sequence with question marks in a  
format that explicitly permits question marks?

Dave


On Apr 13, 2006, at 7:38 PM, Kai Müller wrote:

> hi,
>
> I'm very new to BioPerl and have a maybe silly question.
> when using Bio::AlignIO to load a set of sequences, the  
> questionmarks are
> simply lost (they refer to missing characters as opposed to gap  
> characters
> [-] or ambiguity [N]). I thought that 'missing_char()' might help,  
> but it
> didn't (I probably used it the wrong way).
>
> when $filename contains sequences with ????, the following snippet  
> would
> produce an alignment with ???? lost and downstream nucleotide just  
> shifted
> and the resulting length differnces filled by '---' @ 3' end:
>
>
> my $aln_in = Bio::AlignIO->new(-file => "$filename", '-format' =>  
> 'fasta');
> 	my $aln = $aln_in->next_aln();
> 	$aln->gap_char('-');
> 	$aln->missing_char('?');
> 	
> 	my $testout = Bio::AlignIO->new(-fh => \*STDOUT , '-format' =>  
> 'clustalw');
> 	$testout->write_aln($aln);
>
>
>
> Can somebody give me a hint here?
>
> thanks and all the best,
>
> Kai Müller
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list