[Bioperl-l] Bio::SeqIO issue
Chris Fields
cjfields at illinois.edu
Wed Aug 5 21:04:14 UTC 2009
On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote:
> Is my impression correct that Bio::SeqIO just assumes that sequences
> are
> being submitted in FASTA format?
No. See:
http://www.bioperl.org/wiki/HOWTO:SeqIO
SeqIO tries to guess at the format using the file extension, and if
one isn't present makes use of Bio::Tools::GuessSeqFormat. It's
possible that the extension is causing the problem, or that
GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to
guessing). In any case, it's always advisable to explicitly indicate
the format when possible.
Relevant lines:
return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/i;
...
return 'raw' if /\.(txt)$/i;
> In our experience, implementing
> Bio::SeqIO led to the first line of files being cut off, regardless of
> whether the files were indeed fasta files or files that only contained
> sequence.
Files that only contain sequence are 'raw'. Ones in FASTA are 'fasta'.
> Which, in the latter, led to sequence submissions that had the
> first line of nucleotides removed. Has anyone tried to write a fix for
> this?
This sounds like a bug, but we have very little to go on beyond your
description. What version of bioperl are you using, OS, etc? What
does your data look like? File extension?
chris
> Thanks,
>
> Uwe
>
>
>
>
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> Uwe Hilgert, Ph.D.
>
> Dolan DNA Learning Center
>
> Cold Spring Harbor Laboratory
>
>
>
> V: (516) 367-5185
>
> E: hilgert at cshl.edu <mailto:hilgert at cshl.edu>
>
> F: (516) 367-5182
>
> W: http://www.dnalc.org
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list