[Bioperl-l] Bio::SeqIO issue

Hilmar Lapp hlapp at gmx.net
Wed Aug 5 22:53:56 UTC 2009


I don't think that can be the problem. If anything, providing the  
format ought to be better in terms of result than not providing it?

Uwe - I'd like you to go back to Chris' initial questions that you  
haven't answered yet: "What version of bioperl are you using, OS,  
etc?  What does your data look like?" I'd add to that, can you show us  
your full script, or a smaller code snippet that reproduces the problem.

I suspect that either something in your script is swallowing the line,  
or that the line endings in your data file are from a different OS  
than the one you're running the script on. (Or that you are running a  
very old version of BioPerl, which is entirely possible if you  
installed through CPAN.)

	-hilmar

On Aug 5, 2009, at 5:37 PM, Chris Fields wrote:

> Uwe,
>
> Please keep replies on the list.
>
> It's very possible that's the issue; IIRC the fasta parser pulls out  
> the full sequence in chunks (based on local $/ = "\n>") and splits  
> the header off as the first line in that chunk.  You could probably  
> try leaving the format out and letting SeqIO guess it, or passing  
> the file into Bio::Tools::GuessSeqFormat directly, but it's probably  
> better to go through the files and add a file extension that  
> corresponds to the format.
>
> chris
>
> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote:
>
>> Thanks, Chris. The files have no extension, but we indicate what  
>> format
>> to use, like in the manual:
>>
>> $in  = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta');
>>
>> I wonder now whether this could exactly cause the problem: as we are
>> telling that input files are in fasta format they are being treated  
>> as
>> such (=remove first line) - regardless of whether they really are  
>> fasta?
>>
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> Uwe Hilgert, Ph.D.
>> Dolan DNA Learning Center
>> Cold Spring Harbor Laboratory
>>
>> C: (516) 857-1693
>> V: (516) 367-5185
>> E: hilgert at cshl.edu
>> F: (516) 367-5182
>> W: http://www.dnalc.org
>>
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields at illinois.edu]
>> Sent: Wednesday, August 05, 2009 5:04 PM
>> To: Hilgert, Uwe
>> Cc: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Bio::SeqIO issue
>>
>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote:
>>
>>> Is my impression correct that Bio::SeqIO just assumes that sequences
>>> are
>>> being submitted in FASTA format?
>>
>> No. See:
>>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>
>> SeqIO tries to guess at the format using the file extension, and if
>> one isn't present makes use of Bio::Tools::GuessSeqFormat.  It's
>> possible that the extension is causing the problem, or that
>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to
>> guessing).  In any case, it's always advisable to explicitly indicate
>> the format when possible.
>>
>> Relevant lines:
>>
>>   return 'fasta'   if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ 
>> i;
>> ...
>>   return 'raw'     if /\.(txt)$/i;
>>
>>> In our experience, implementing
>>> Bio::SeqIO led to the first line of files being cut off,  
>>> regardless of
>>> whether the files were indeed fasta files or files that only  
>>> contained
>>> sequence.
>>
>> Files that only contain sequence are 'raw'.  Ones in FASTA are  
>> 'fasta'.
>>
>>> Which, in the latter, led to sequence submissions that had the
>>> first line of nucleotides removed. Has anyone tried to write a fix  
>>> for
>>> this?
>>
>> This sounds like a bug, but we have very little to go on beyond your
>> description.  What version of bioperl are you using, OS, etc?  What
>> does your data look like?  File extension?
>>
>> chris
>>
>>> Thanks,
>>>
>>> Uwe
>>>
>>>
>>>
>>>
>>>
>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>>
>>> Uwe Hilgert, Ph.D.
>>>
>>> Dolan DNA Learning Center
>>>
>>> Cold Spring Harbor Laboratory
>>>
>>>
>>>
>>> V: (516) 367-5185
>>>
>>> E: hilgert at cshl.edu <mailto:hilgert at cshl.edu>
>>>
>>> F: (516) 367-5182
>>>
>>> W: http://www.dnalc.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================






More information about the Bioperl-l mailing list