[Bioperl-l] perl one-liner with Bio::SeqIO

Roy Chaudhuri roy.chaudhuri at gmail.com
Thu Jul 22 07:49:41 EDT 2010


Hi Alper,

The problem comes about because you don't specify -format=>'fasta' in 
your Bio::SeqIO object. BioPerl attempts to guess the format if you 
don't specify it, but seems to be struggling in this case. I can't 
really think of any good reason for not specifying the format. Just in 
case anyone wants to investigate further, I noticed that if you try the 
example with longer fasta sequences, the first line of the sequence is 
interpreted as the id, with the remainder as the sequence.

Cheers.
Roy.

On 22/07/2010 11:48, Frank Schwach wrote:
> Hi Alper,
>
> You can actually reproduce it also by providing STDIN from keyboard
> input like so:
> $ perl -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh =>\*STDIN); while
> ($myseq=$seq->next_seq){ print $myseq->id,"\t",$myseq->seq,"\n"}'
>> 1
> aaaaaaaaa
>> 2
> aaaaaaaaa
> ggggggggg
>> 3
> 2       ggggggggg
> ccccccccc
> 3       ccccccccc
>
> In this case I typed
> ">1"[ENTER]
> "aaaaaaaaa"[ENTER]
> ">2"[ENTER}
> then the command returned the sequence of the first entry without the ID
> again.
>> From the second entry onwards, it is all correct.
>
> I'm not 100% sure but could it be linked to buffering? SeqIO has to read
> ahead to find a complete entry that spans multiple lines. When you get
> STDIN from a file, you will get buffering and receive more than one line
> at once, which will allow the next_seq method to work as expected. If
> you provide line-by-line input then that method probably can't work
> correctly.
> If that is the case then you can't use the command in a pipe at all.
>
> Frank
>
>
>
> On Thu, 2010-07-22 at 00:09 -0400, Alper Yilmaz wrote:
>> Hi,
>>
>> I was using Bio::SeqIO with perl one-liner and I noticed an oddity.
>> Can someone suggest a correction or workaround?
>>
>> Let test.fa be;
>>> 1
>> AGTC
>>> 2
>> CTGA
>>
>> Then, commandline below prints the expected output:
>> $ perl -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh =>\*STDIN); while
>> ($myseq=$seq->next_seq){ print $myseq->id,"\t",$myseq->seq,"\n"}'<
>> test.fa
>>
>> output:
>> 1	AGTC
>> 2	CTGA
>>
>> However, if use the command in a pipe, then the output has an issue
>> with primary_id of initial sequence.
>> $ cat test.fa | perl -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh
>> =>\*STDIN); while ($myseq=$seq->next_seq){ print
>> $myseq->id,"\t",$myseq->seq,"\n"}'
>>
>> output:
>> AGTC	
>> 2	CTGA
>>
>> What is the workaround to make Bio::SeqIO work correctly in a
>> one-liner with pipes?
>>
>> thanks,
>>
>> Alper Yilmaz
>> Post-doctoral Researcher
>> Plant Biotechnology Center
>> The Ohio State University
>> 1060 Carmack Rd
>> Columbus, OH 43210
>> (614)688-4954
>>
>>
>> PS: Normally, the example is demonstrating useless use of cat, for the
>> sake giving an example, it can be "command1 | command2 | command3 |
>> perl -MBioSeqIO -e'...' " instead..
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>




More information about the Bioperl-l mailing list