[Bioperl-l] Bio::SeqIO issue
Hilmar Lapp
hlapp at gmx.net
Wed Aug 5 22:53:56 UTC 2009
I don't think that can be the problem. If anything, providing the
format ought to be better in terms of result than not providing it?
Uwe - I'd like you to go back to Chris' initial questions that you
haven't answered yet: "What version of bioperl are you using, OS,
etc? What does your data look like?" I'd add to that, can you show us
your full script, or a smaller code snippet that reproduces the problem.
I suspect that either something in your script is swallowing the line,
or that the line endings in your data file are from a different OS
than the one you're running the script on. (Or that you are running a
very old version of BioPerl, which is entirely possible if you
installed through CPAN.)
-hilmar
On Aug 5, 2009, at 5:37 PM, Chris Fields wrote:
> Uwe,
>
> Please keep replies on the list.
>
> It's very possible that's the issue; IIRC the fasta parser pulls out
> the full sequence in chunks (based on local $/ = "\n>") and splits
> the header off as the first line in that chunk. You could probably
> try leaving the format out and letting SeqIO guess it, or passing
> the file into Bio::Tools::GuessSeqFormat directly, but it's probably
> better to go through the files and add a file extension that
> corresponds to the format.
>
> chris
>
> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote:
>
>> Thanks, Chris. The files have no extension, but we indicate what
>> format
>> to use, like in the manual:
>>
>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta');
>>
>> I wonder now whether this could exactly cause the problem: as we are
>> telling that input files are in fasta format they are being treated
>> as
>> such (=remove first line) - regardless of whether they really are
>> fasta?
>>
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> Uwe Hilgert, Ph.D.
>> Dolan DNA Learning Center
>> Cold Spring Harbor Laboratory
>>
>> C: (516) 857-1693
>> V: (516) 367-5185
>> E: hilgert at cshl.edu
>> F: (516) 367-5182
>> W: http://www.dnalc.org
>>
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields at illinois.edu]
>> Sent: Wednesday, August 05, 2009 5:04 PM
>> To: Hilgert, Uwe
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::SeqIO issue
>>
>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote:
>>
>>> Is my impression correct that Bio::SeqIO just assumes that sequences
>>> are
>>> being submitted in FASTA format?
>>
>> No. See:
>>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>> SeqIO tries to guess at the format using the file extension, and if
>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's
>> possible that the extension is causing the problem, or that
>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to
>> guessing). In any case, it's always advisable to explicitly indicate
>> the format when possible.
>>
>> Relevant lines:
>>
>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/
>> i;
>> ...
>> return 'raw' if /\.(txt)$/i;
>>
>>> In our experience, implementing
>>> Bio::SeqIO led to the first line of files being cut off,
>>> regardless of
>>> whether the files were indeed fasta files or files that only
>>> contained
>>> sequence.
>>
>> Files that only contain sequence are 'raw'. Ones in FASTA are
>> 'fasta'.
>>
>>> Which, in the latter, led to sequence submissions that had the
>>> first line of nucleotides removed. Has anyone tried to write a fix
>>> for
>>> this?
>>
>> This sounds like a bug, but we have very little to go on beyond your
>> description. What version of bioperl are you using, OS, etc? What
>> does your data look like? File extension?
>>
>> chris
>>
>>> Thanks,
>>>
>>> Uwe
>>>
>>>
>>>
>>>
>>>
>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>>
>>> Uwe Hilgert, Ph.D.
>>>
>>> Dolan DNA Learning Center
>>>
>>> Cold Spring Harbor Laboratory
>>>
>>>
>>>
>>> V: (516) 367-5185
>>>
>>> E: hilgert at cshl.edu <mailto:hilgert at cshl.edu>
>>>
>>> F: (516) 367-5182
>>>
>>> W: http://www.dnalc.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list