[Bioperl-l] "Be forgiving in what you accept" and Bio::Tools::GuessSeqFormat

Brian Osborne brian_osborne at cognia.com
Mon Jul 25 10:11:16 EDT 2005


George,

Done.

Brian O.


On 7/21/05 3:34 PM, "George Hartzell" <hartzell at kestrel.alerce.com> wrote:

> 
> There's a great "old" Internet maxim, "Be forgiving in what you accept
> and strict in what you send".
> 
> The Bio::Seqio modules seem to be able to cope with "fasta" formatted
> files that have a space separating the ">" from the rest of the line
> (e.g.  "> ape") if a) you explicitly specify the format or b) if you
> have the sequence in a file that ends in "fa" (or generally matches
> the list of patterns that correspond to fasta file names).
> 
> But, if you happen to have the sequence in a file with a funny name
> (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails.  It
> can't guess based on the filename and the file content test is strict
> and wants to see the header line without the whitespace (">ape").
> 
> Is there any reason not to extend the regexp a bit and relax that
> constraint (since everything else seems to cope with it)?
> 
> Something like this:
> 
> *** /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm.orig Thu
> Jul 21 12:30:55 2005
> --- /usr/local/lib/perl5/site_perl/5.8.6/Bio/Tools/GuessSeqFormat.pm Thu Jul
> 21 12:31:45 2005
> ***************
> *** 591,595 ****
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\w/);
>   }
>   
> --- 591,595 ----
>       my ($line, $lineno) = (shift, shift);
>       return (($lineno != 1 && $line =~ /^[A-IK-NP-Z]+$/i) ||
> !             $line =~ /^>\s*\w/);
>   }
>   
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list