[Bioperl-l] Bio::SeqIO issue

Mark A. Jensen maj at fortinbras.us
Wed Aug 5 23:12:52 UTC 2009


If these items were included in a Bugzilla report, that would be 
most convenient (= most likely to get looked carefully)
and is the best place for us to keep track of 
these kinds of issues-- http://bugzilla.bioperl.org/
cheers MAJ
----- Original Message ----- 
From: "Hilmar Lapp" <hlapp at gmx.net>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Wednesday, August 05, 2009 6:53 PM
Subject: Re: [Bioperl-l] Bio::SeqIO issue


>I don't think that can be the problem. If anything, providing the  
> format ought to be better in terms of result than not providing it?
> 
> Uwe - I'd like you to go back to Chris' initial questions that you  
> haven't answered yet: "What version of bioperl are you using, OS,  
> etc?  What does your data look like?" I'd add to that, can you show us  
> your full script, or a smaller code snippet that reproduces the problem.
> 
> I suspect that either something in your script is swallowing the line,  
> or that the line endings in your data file are from a different OS  
> than the one you're running the script on. (Or that you are running a  
> very old version of BioPerl, which is entirely possible if you  
> installed through CPAN.)
> 
> -hilmar
> 
> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote:
> 
>> Uwe,
>>
>> Please keep replies on the list.
>>
>> It's very possible that's the issue; IIRC the fasta parser pulls out  
>> the full sequence in chunks (based on local $/ = "\n>") and splits  
>> the header off as the first line in that chunk.  You could probably  
>> try leaving the format out and letting SeqIO guess it, or passing  
>> the file into Bio::Tools::GuessSeqFormat directly, but it's probably  
>> better to go through the files and add a file extension that  
>> corresponds to the format.
>>
>> chris
>>
>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote:
>>
>>> Thanks, Chris. The files have no extension, but we indicate what  
>>> format
>>> to use, like in the manual:
>>>
>>> $in  = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta');
>>>
>>> I wonder now whether this could exactly cause the problem: as we are
>>> telling that input files are in fasta format they are being treated  
>>> as
>>> such (=remove first line) - regardless of whether they really are  
>>> fasta?
>>>
>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> Uwe Hilgert, Ph.D.
>>> Dolan DNA Learning Center
>>> Cold Spring Harbor Laboratory
>>>
>>> C: (516) 857-1693
>>> V: (516) 367-5185
>>> E: hilgert at cshl.edu
>>> F: (516) 367-5182
>>> W: http://www.dnalc.org
>>>
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Wednesday, August 05, 2009 5:04 PM
>>> To: Hilgert, Uwe
>>> Cc: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue
>>>
>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote:
>>>
>>>> Is my impression correct that Bio::SeqIO just assumes that sequences
>>>> are
>>>> being submitted in FASTA format?
>>>
>>> No. See:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>>
>>> SeqIO tries to guess at the format using the file extension, and if
>>> one isn't present makes use of Bio::Tools::GuessSeqFormat.  It's
>>> possible that the extension is causing the problem, or that
>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to
>>> guessing).  In any case, it's always advisable to explicitly indicate
>>> the format when possible.
>>>
>>> Relevant lines:
>>>
>>>   return 'fasta'   if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ 
>>> i;
>>> ...
>>>   return 'raw'     if /\.(txt)$/i;
>>>
>>>> In our experience, implementing
>>>> Bio::SeqIO led to the first line of files being cut off,  
>>>> regardless of
>>>> whether the files were indeed fasta files or files that only  
>>>> contained
>>>> sequence.
>>>
>>> Files that only contain sequence are 'raw'.  Ones in FASTA are  
>>> 'fasta'.
>>>
>>>> Which, in the latter, led to sequence submissions that had the
>>>> first line of nucleotides removed. Has anyone tried to write a fix  
>>>> for
>>>> this?
>>>
>>> This sounds like a bug, but we have very little to go on beyond your
>>> description.  What version of bioperl are you using, OS, etc?  What
>>> does your data look like?  File extension?
>>>
>>> chris
>>>
>>>> Thanks,
>>>>
>>>> Uwe
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>>>
>>>> Uwe Hilgert, Ph.D.
>>>>
>>>> Dolan DNA Learning Center
>>>>
>>>> Cold Spring Harbor Laboratory
>>>>
>>>>
>>>>
>>>> V: (516) 367-5185
>>>>
>>>> E: hilgert at cshl.edu <mailto:hilgert at cshl.edu>
>>>>
>>>> F: (516) 367-5182
>>>>
>>>> W: http://www.dnalc.org
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>



More information about the Bioperl-l mailing list