[Bioperl-l] perl one-liner with Bio::SeqIO

Roy Chaudhuri roy.chaudhuri at gmail.com
Thu Jul 22 09:41:00 EDT 2010


Done (bug 3122).

On 22/07/2010 14:27, Chris Fields wrote:
> Would someone like to file this as a bug?  My guess is this may be a
> combination of using pipes and the way FASTA is parsed (locally
> resets $/).
>
> http://bugzilla.open-bio.org
>
> chris
>
> On Jul 22, 2010, at 6:49 AM, Roy Chaudhuri wrote:
>
>> Hi Alper,
>>
>> The problem comes about because you don't specify -format=>'fasta'
>> in your Bio::SeqIO object. BioPerl attempts to guess the format if
>> you don't specify it, but seems to be struggling in this case. I
>> can't really think of any good reason for not specifying the
>> format. Just in case anyone wants to investigate further, I noticed
>> that if you try the example with longer fasta sequences, the first
>> line of the sequence is interpreted as the id, with the remainder
>> as the sequence.
>>
>> Cheers. Roy.
>>
>> On 22/07/2010 11:48, Frank Schwach wrote:
>>> Hi Alper,
>>>
>>> You can actually reproduce it also by providing STDIN from
>>> keyboard input like so: $ perl -MBio::SeqIO -e 'my
>>> $seq=Bio::SeqIO->new(-fh =>\*STDIN); while
>>> ($myseq=$seq->next_seq){ print
>>> $myseq->id,"\t",$myseq->seq,"\n"}'
>>>> 1
>>> aaaaaaaaa
>>>> 2
>>> aaaaaaaaa ggggggggg
>>>> 3
>>> 2       ggggggggg ccccccccc 3       ccccccccc
>>>
>>> In this case I typed ">1"[ENTER] "aaaaaaaaa"[ENTER] ">2"[ENTER}
>>> then the command returned the sequence of the first entry without
>>> the ID again.
>>>> From the second entry onwards, it is all correct.
>>>
>>> I'm not 100% sure but could it be linked to buffering? SeqIO has
>>> to read ahead to find a complete entry that spans multiple lines.
>>> When you get STDIN from a file, you will get buffering and
>>> receive more than one line at once, which will allow the next_seq
>>> method to work as expected. If you provide line-by-line input
>>> then that method probably can't work correctly. If that is the
>>> case then you can't use the command in a pipe at all.
>>>
>>> Frank
>>>
>>>
>>>
>>> On Thu, 2010-07-22 at 00:09 -0400, Alper Yilmaz wrote:
>>>> Hi,
>>>>
>>>> I was using Bio::SeqIO with perl one-liner and I noticed an
>>>> oddity. Can someone suggest a correction or workaround?
>>>>
>>>> Let test.fa be;
>>>>> 1
>>>> AGTC
>>>>> 2
>>>> CTGA
>>>>
>>>> Then, commandline below prints the expected output: $ perl
>>>> -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh =>\*STDIN); while
>>>> ($myseq=$seq->next_seq){ print
>>>> $myseq->id,"\t",$myseq->seq,"\n"}'< test.fa
>>>>
>>>> output: 1	AGTC 2	CTGA
>>>>
>>>> However, if use the command in a pipe, then the output has an
>>>> issue with primary_id of initial sequence. $ cat test.fa | perl
>>>> -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh =>\*STDIN); while
>>>> ($myseq=$seq->next_seq){ print
>>>> $myseq->id,"\t",$myseq->seq,"\n"}'
>>>>
>>>> output: AGTC 2	CTGA
>>>>
>>>> What is the workaround to make Bio::SeqIO work correctly in a
>>>> one-liner with pipes?
>>>>
>>>> thanks,
>>>>
>>>> Alper Yilmaz Post-doctoral Researcher Plant Biotechnology
>>>> Center The Ohio State University 1060 Carmack Rd Columbus, OH
>>>> 43210 (614)688-4954
>>>>
>>>>
>>>> PS: Normally, the example is demonstrating useless use of cat,
>>>> for the sake giving an example, it can be "command1 | command2
>>>> | command3 | perl -MBioSeqIO -e'...' " instead..
>>>> _______________________________________________ Bioperl-l
>>>> mailing list Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
>> _______________________________________________ Bioperl-l mailing
>> list Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list