[Bioperl-l] perl one-liner with Bio::SeqIO
Frank Schwach
fs5 at sanger.ac.uk
Thu Jul 22 06:48:25 EDT 2010
Hi Alper,
You can actually reproduce it also by providing STDIN from keyboard
input like so:
$ perl -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh =>\*STDIN); while
($myseq=$seq->next_seq){ print $myseq->id,"\t",$myseq->seq,"\n"}'
>1
aaaaaaaaa
>2
aaaaaaaaa
ggggggggg
>3
2 ggggggggg
ccccccccc
3 ccccccccc
In this case I typed
">1"[ENTER]
"aaaaaaaaa"[ENTER]
">2"[ENTER}
then the command returned the sequence of the first entry without the ID
again.
>From the second entry onwards, it is all correct.
I'm not 100% sure but could it be linked to buffering? SeqIO has to read
ahead to find a complete entry that spans multiple lines. When you get
STDIN from a file, you will get buffering and receive more than one line
at once, which will allow the next_seq method to work as expected. If
you provide line-by-line input then that method probably can't work
correctly.
If that is the case then you can't use the command in a pipe at all.
Frank
On Thu, 2010-07-22 at 00:09 -0400, Alper Yilmaz wrote:
> Hi,
>
> I was using Bio::SeqIO with perl one-liner and I noticed an oddity.
> Can someone suggest a correction or workaround?
>
> Let test.fa be;
> >1
> AGTC
> >2
> CTGA
>
> Then, commandline below prints the expected output:
> $ perl -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh =>\*STDIN); while
> ($myseq=$seq->next_seq){ print $myseq->id,"\t",$myseq->seq,"\n"}' <
> test.fa
>
> output:
> 1 AGTC
> 2 CTGA
>
> However, if use the command in a pipe, then the output has an issue
> with primary_id of initial sequence.
> $ cat test.fa | perl -MBio::SeqIO -e 'my $seq=Bio::SeqIO->new(-fh
> =>\*STDIN); while ($myseq=$seq->next_seq){ print
> $myseq->id,"\t",$myseq->seq,"\n"}'
>
> output:
> AGTC
> 2 CTGA
>
> What is the workaround to make Bio::SeqIO work correctly in a
> one-liner with pipes?
>
> thanks,
>
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
>
>
> PS: Normally, the example is demonstrating useless use of cat, for the
> sake giving an example, it can be "command1 | command2 | command3 |
> perl -MBioSeqIO -e'...' " instead..
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list