[Bioperl-l] perl one-liner with Bio::SeqIO

Frank Schwach fs5 at sanger.ac.uk
Thu Jul 22 06:48:25 EDT 2010


Hi Alper,

You can actually reproduce it also by providing STDIN from keyboard
input like so:
$ perl -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh =>\*STDIN); while
($myseq=$seq->next_seq){ print $myseq->id,"\t",$myseq->seq,"\n"}'
>1
aaaaaaaaa
>2
aaaaaaaaa
ggggggggg
>3
2       ggggggggg
ccccccccc
3       ccccccccc

In this case I typed 
">1"[ENTER]
"aaaaaaaaa"[ENTER]
">2"[ENTER}
then the command returned the sequence of the first entry without the ID
again.
>From the second entry onwards, it is all correct.

I'm not 100% sure but could it be linked to buffering? SeqIO has to read
ahead to find a complete entry that spans multiple lines. When you get
STDIN from a file, you will get buffering and receive more than one line
at once, which will allow the next_seq method to work as expected. If
you provide line-by-line input then that method probably can't work
correctly.
If that is the case then you can't use the command in a pipe at all.

Frank



On Thu, 2010-07-22 at 00:09 -0400, Alper Yilmaz wrote:
> Hi,
> 
> I was using Bio::SeqIO with perl one-liner and I noticed an oddity.
> Can someone suggest a correction or workaround?
> 
> Let test.fa be;
> >1
> AGTC
> >2
> CTGA
> 
> Then, commandline below prints the expected output:
> $ perl -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh =>\*STDIN); while
> ($myseq=$seq->next_seq){ print $myseq->id,"\t",$myseq->seq,"\n"}'  <
> test.fa
> 
> output:
> 1	AGTC
> 2	CTGA
> 
> However, if use the command in a pipe, then the output has an issue
> with primary_id of initial sequence.
> $ cat test.fa | perl -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh
> =>\*STDIN); while ($myseq=$seq->next_seq){ print
> $myseq->id,"\t",$myseq->seq,"\n"}'
> 
> output:
> AGTC	
> 2	CTGA
> 
> What is the workaround to make Bio::SeqIO work correctly in a
> one-liner with pipes?
> 
> thanks,
> 
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
> 
> 
> PS: Normally, the example is demonstrating useless use of cat, for the
> sake giving an example, it can be "command1 | command2 | command3 |
> perl -MBioSeqIO -e'...' " instead..
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list