[Bioperl-l] Bio::SeqIO issue
Mark A. Jensen
maj at fortinbras.us
Wed Aug 5 23:12:52 UTC 2009
If these items were included in a Bugzilla report, that would be
most convenient (= most likely to get looked carefully)
and is the best place for us to keep track of
these kinds of issues-- http://bugzilla.bioperl.org/
cheers MAJ
----- Original Message -----
From: "Hilmar Lapp" <hlapp at gmx.net>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Wednesday, August 05, 2009 6:53 PM
Subject: Re: [Bioperl-l] Bio::SeqIO issue
>I don't think that can be the problem. If anything, providing the
> format ought to be better in terms of result than not providing it?
>
> Uwe - I'd like you to go back to Chris' initial questions that you
> haven't answered yet: "What version of bioperl are you using, OS,
> etc? What does your data look like?" I'd add to that, can you show us
> your full script, or a smaller code snippet that reproduces the problem.
>
> I suspect that either something in your script is swallowing the line,
> or that the line endings in your data file are from a different OS
> than the one you're running the script on. (Or that you are running a
> very old version of BioPerl, which is entirely possible if you
> installed through CPAN.)
>
> -hilmar
>
> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote:
>
>> Uwe,
>>
>> Please keep replies on the list.
>>
>> It's very possible that's the issue; IIRC the fasta parser pulls out
>> the full sequence in chunks (based on local $/ = "\n>") and splits
>> the header off as the first line in that chunk. You could probably
>> try leaving the format out and letting SeqIO guess it, or passing
>> the file into Bio::Tools::GuessSeqFormat directly, but it's probably
>> better to go through the files and add a file extension that
>> corresponds to the format.
>>
>> chris
>>
>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote:
>>
>>> Thanks, Chris. The files have no extension, but we indicate what
>>> format
>>> to use, like in the manual:
>>>
>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta');
>>>
>>> I wonder now whether this could exactly cause the problem: as we are
>>> telling that input files are in fasta format they are being treated
>>> as
>>> such (=remove first line) - regardless of whether they really are
>>> fasta?
>>>
>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> Uwe Hilgert, Ph.D.
>>> Dolan DNA Learning Center
>>> Cold Spring Harbor Laboratory
>>>
>>> C: (516) 857-1693
>>> V: (516) 367-5185
>>> E: hilgert at cshl.edu
>>> F: (516) 367-5182
>>> W: http://www.dnalc.org
>>>
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Wednesday, August 05, 2009 5:04 PM
>>> To: Hilgert, Uwe
>>> Cc: bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue
>>>
>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote:
>>>
>>>> Is my impression correct that Bio::SeqIO just assumes that sequences
>>>> are
>>>> being submitted in FASTA format?
>>>
>>> No. See:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>>
>>> SeqIO tries to guess at the format using the file extension, and if
>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's
>>> possible that the extension is causing the problem, or that
>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to
>>> guessing). In any case, it's always advisable to explicitly indicate
>>> the format when possible.
>>>
>>> Relevant lines:
>>>
>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/
>>> i;
>>> ...
>>> return 'raw' if /\.(txt)$/i;
>>>
>>>> In our experience, implementing
>>>> Bio::SeqIO led to the first line of files being cut off,
>>>> regardless of
>>>> whether the files were indeed fasta files or files that only
>>>> contained
>>>> sequence.
>>>
>>> Files that only contain sequence are 'raw'. Ones in FASTA are
>>> 'fasta'.
>>>
>>>> Which, in the latter, led to sequence submissions that had the
>>>> first line of nucleotides removed. Has anyone tried to write a fix
>>>> for
>>>> this?
>>>
>>> This sounds like a bug, but we have very little to go on beyond your
>>> description. What version of bioperl are you using, OS, etc? What
>>> does your data look like? File extension?
>>>
>>> chris
>>>
>>>> Thanks,
>>>>
>>>> Uwe
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>>>
>>>> Uwe Hilgert, Ph.D.
>>>>
>>>> Dolan DNA Learning Center
>>>>
>>>> Cold Spring Harbor Laboratory
>>>>
>>>>
>>>>
>>>> V: (516) 367-5185
>>>>
>>>> E: hilgert at cshl.edu <mailto:hilgert at cshl.edu>
>>>>
>>>> F: (516) 367-5182
>>>>
>>>> W: http://www.dnalc.org
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list