[Bioperl-l] Bio::SeqIO::tab deletes gap characters when reading sequences, which is inconvenient

Fields, Christopher J cjfields at illinois.edu
Thu May 10 16:56:27 EDT 2012


Tim, 

This one got stuck in my drafts folder :P

Easy enough to do.  I've added this in to the master branch, commit eece9dd.

chris

On Apr 17, 2012, at 6:59 PM, Tim White wrote:

> Hi,
> 
> Bio::SeqIO::tab (what you get when specifying -format => 'tab' to Bio::SeqIO->new()) is perfect for converting sequences into a one-per-line format, so that standard line-oriented UNIX tools (grep, comm etc.) work as expected.  Except...  I just discovered that it deletes gap ("-") characters when reading sequences, so it can't be used to round-trip any files that contain these.  This is a source of grief as I frequently work with FASTA files that contain aligned sequences, and thus gap characters.
> 
> This is all because the next_seq() function in Bio::SeqIO::tab.pm contains the line:
> 
> $seq =~ s/\W//g;
> 
> which removes all non-alphanumeric characters from the sequence data.  IMHO it would be *much* better if this was changed to:
> 
> $seq =~ s/\s//g;
> 
> which simply removes all whitespace characters (particularly including the \r that often appears at the ends of lines on text files that have visited Windows), enabling gap characters (and, for example, periods and asterisks) to be preserved.  Alternatively, you could simply get rid of this line of code and allow whitespace characters through.
> 
> I'm not sure whether this counts as a "bug", as a cursory search didn't turn up any docs explaining precisely what characters are and aren't preserved by classes implementing Bio::SeqIO, but it's certainly inconsistent (at least Bio::SeqIO::fasta, and Bio::SeqIO::table, with columns and delimiters set up appropriately, allow round-tripping of files containing gap characters) as well as extremely inconvenient for me personally, and I suspect for others.  Assuming no harm would be done by making the above change, what's the best thing to do to get this changed?  I've simply edited my own local copy of tab.pm to make the above change, but obviously if others agree I'd like to get the change done upstream.
> 
> Thanks,
> Tim
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list