[Bioperl-l] use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Mark A. Jensen maj at fortinbras.us
Sun Jul 13 02:14:57 UTC 2014


Haiyan - I think you want the following--

##########

#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use Bio::SeqIO ;
use Getopt::Long;
use Statistics::Descriptive ;

no warnings qw/once/;

my %opt = () ;
GetOptions(
     "sum"=>\$opt{sum},
);
my $sta = Statistics::Descriptive::Full->new();

my $in = Bio::SeqIO->new(-format=>"Fasta",-fh=>\*DATA);

while(my $s = $in->next_seq()){
     $sta->add_data($s->length()) ;
}

#`rm $tf` ;
unlink $tf;

if ( $opt{sum} ) {
     print $sta->sum(), "\n"  ;
}

# the DATA goes here:
__END__
>Contig000001
CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>hello
ATATATTTTT

############



On 2014-07-12 21:56, Haiyan Lin wrote:
> Thanks to Mark and Torsten for their advice.
>
> I can got what I want using both "less" and "cat" by writing data 
> form
> upstream pipeline into a temporary file, followed by  processing and
> removing. But, it is not efficient. I'm looking for a better method.
> Whether can I avoiding writting and removinf the file? Thanks for any
> advic
>
> ##### Following is MyData
> [linhy at bioinfo1 Script]$ more try.fa
>>Contig000001
> CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>>hello
> ATATATTTTT
>
> ##### Following is my code
> [linhy at bioinfo1 Script]$ more fastaLen.pl
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
> use utf8;
> use Bio::SeqIO ;
> use Getopt::Long;
> use Statistics::Descriptive ;
>
> my %opt = () ;
> GetOptions(
>     "sum"=>\$opt{sum},
> );
>
> my $tf = "/tmp/subFasta.len.PID$$.fa";
> open WTF,">$tf" or die $!;
> while(<>){
>     print WTF  ;
> }
> close WTF;
>
> my $sta = Statistics::Descriptive::Full->new();
>
> my $in = Bio::SeqIO->new(-format=>"Fasta",-file=>$tf);
>
> #### I'm failed with the following commented line
> #my $in = Bio::SeqIO->new(-format=>"Fasta",-fh=>\*DATA);
>
> while(my $s = $in->next_seq()){
>     $sta->add_data($s->length()) ;
> }
> `rm $tf` ;
>
> if ( $opt{sum} ) {
>     print $sta->sum(), "\n"  ;
> }
>
> exit;
>
>
> ##### The output by above code and data using by both "cat" and 
> "less"
> [linhy at bioinfo1 Script]$ cat try.fa | ./fastaLen.pl -s
> 70
> [linhy at bioinfo1 Script]$ less try.fa | ./fastaLen.pl -s
> 70
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, 2014-07-12 at 20:57 -0400, Mark A Jensen wrote:
>> You should do
>> cat try.fa | perl ..
>> rather than less,IMO. less formats things.
>>
>> You can add
>>   no warnings qw/once/;
>> to get rid of that warning, it shouldn't affect
>> the program.
>>
>> MAJ
>>
>>
>> On Sat, Jul 12, 2014 at 8:03 PM, Haiyan Lin <linhy0120 at gmail.com>
>> wrote:
>>
>>
>>         Hi, Paul,
>>
>>         Thanks for your quick replay and advice. Soryy for my late
>>         feedback, I can't send maill in my location yesterday. I 
>> have
>>         tried your method. But
>>         it complains that
>>
>>         ------------------------------------
>>         [linhy at bioinfo1 Script]$ less try.fa | perl ./fastaLen.pl
>>         Name "main::DATA" used only once: possible typo
>>         at ./fastaLen.pl line
>>         42.
>>         Use of uninitialized value in print at ./fastaLen.pl line 
>> 49.
>>         ------------------------------------
>>
>>
>>         I had also tried the "-fh=>\*STDIN", and "-fh=><>",according
>>         to the
>>         documentation returned by "perldoc Bio::SeqIO" at termminal,
>>         but I still
>>         haven't got I want.Thanks again.
>>
>>         ... ...
>>
>>            Bio::SeqIO->new()
>>                   $seqIO = Bio::SeqIO->new(-file => 'filename',
>>         -format=>$format);
>>                   $seqIO = Bio::SeqIO->new(-fh   => \*FILEHANDLE,
>>         -format=>$format);
>>                   $seqIO = Bio::SeqIO->new(-format => $format);
>>
>>                ... ...
>>
>>                -fh  You may provide new() with a previously-opened
>>         filehandle.
>>         For example, to read from STDIN:
>>
>>                        $seqIO = Bio::SeqIO->new(-fh => \*STDIN);
>>
>>                     Note that you must pass filehandles as 
>> references
>>         to globs.
>>
>>                     If neither a filehandle nor a filename is
>>         specified, then
>>         the module will read from the @ARGV array or STDIN, using 
>> the
>>                     familiar <> semantics.
>>
>>         -------------------------------
>>
>>
>>
>>         Regards
>>
>>         Haiyan
>>
>>
>>
>>
>>         On Sat, 2014-07-12 at 10:22 -0400, Paul Cantalupo wrote:
>>         > Hi Haiyan,
>>         >
>>         >
>>         > You need to use the '-fh' option in Bio::SeqIO new and 
>> have
>>         it use
>>         > Perl's DATA filehandle like so:
>>         > my $in = Bio::SeqIO->new(-fh => \*DATA, -format => 
>> 'fasta');
>>         >
>>         > This was taken from
>>         > http://perldoc.perl.org/perldata.html#Special-Literals:
>>         > "Text after __DATA__ may be read via the filehandle
>>         PACKNAME::DATA ,
>>         > where PACKNAME is the package that was current when the
>>         __DATA__ token
>>         > was encountered."
>>         >
>>         >
>>         >
>>         > Good luck,
>>         >
>>         > Paul
>>         >
>>         >
>>         >
>>         >
>>         > Paul Cantalupo
>>         > University of Pittsburgh
>>         >
>>         >
>>         >
>>         > On Sat, Jul 12, 2014 at 8:35 AM, Haiyan Lin
>>         <linhy0120 at gmail.com>
>>         > wrote:
>>         >         Hill, dear perlers,
>>         >
>>         >         I‘m trying to use Bio::SeqIO to read Fasta 
>> sequence
>>         from pipe,
>>         >         or @ARGV,
>>         >         like "while (<>) {....}". After several trier and
>>         error,  I'm
>>         >         failed and
>>         >         need to ask for herp from you. Could you please 
>> help
>>         me to
>>         >         check or try
>>         >         following code?
>>         >
>>         >         Thanks in advance.
>>         >
>>         >         ---------------------------------------
>>         >         use Bio::SeqIO ;
>>         >         use Statistics::Descriptive ;
>>         >
>>         >         my %opt = () ;
>>         >         my $sta = Statistics::Descriptive::Full->new();
>>         >
>>         >         ##### here is the key, I think.
>>         >         my $in = Bio::SeqIO->new(-format=>"Fasta");
>>         >         while(my $s = $in->next_seq()){
>>         >             $sta->add_data($s->length()) ;
>>         >         }
>>         >         print $sta->sum() if $opt{sum} ;
>>         >
>>         >         __DATA__
>>         >         >ct1
>>         >         AGAGAGAGA
>>         >         >ctg2
>>         >         ATATATAT
>>         >         -----------------------------------------------
>>         >
>>         >         Regards
>>         >
>>         >         Haiyan
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >         _______________________________________________
>>         >         Bioperl-l mailing list
>>         >         Bioperl-l at mailman.open-bio.org
>>         >
>>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>         >
>>         >
>>
>>
>>
>>         _______________________________________________
>>         Bioperl-l mailing list
>>         Bioperl-l at mailman.open-bio.org
>>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>



More information about the Bioperl-l mailing list