[Bioperl-l] use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"

Haiyan Lin linhy0120 at gmail.com
Sun Jul 13 01:56:52 UTC 2014


Thanks to Mark and Torsten for their advice.

I can got what I want using both "less" and "cat" by writing data form
upstream pipeline into a temporary file, followed by  processing and
removing. But, it is not efficient. I'm looking for a better method.
Whether can I avoiding writting and removinf the file? Thanks for any
advic

##### Following is MyData
[linhy at bioinfo1 Script]$ more try.fa 
>Contig000001
CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>hello
ATATATTTTT

##### Following is my code
[linhy at bioinfo1 Script]$ more fastaLen.pl 
#!/usr/bin/env perl 

use strict;
use warnings;
use utf8;
use Bio::SeqIO ;
use Getopt::Long; 
use Statistics::Descriptive ;

my %opt = () ;
GetOptions(
    "sum"=>\$opt{sum},
);

my $tf = "/tmp/subFasta.len.PID$$.fa";
open WTF,">$tf" or die $!;
while(<>){
    print WTF  ;
}
close WTF; 

my $sta = Statistics::Descriptive::Full->new();

my $in = Bio::SeqIO->new(-format=>"Fasta",-file=>$tf);

#### I'm failed with the following commented line
#my $in = Bio::SeqIO->new(-format=>"Fasta",-fh=>\*DATA);

while(my $s = $in->next_seq()){ 
    $sta->add_data($s->length()) ;
}
`rm $tf` ;

if ( $opt{sum} ) {
    print $sta->sum(), "\n"  ;
}

exit;


##### The output by above code and data using by both "cat" and "less"
[linhy at bioinfo1 Script]$ cat try.fa | ./fastaLen.pl -s
70
[linhy at bioinfo1 Script]$ less try.fa | ./fastaLen.pl -s
70















On Sat, 2014-07-12 at 20:57 -0400, Mark A Jensen wrote:
> You should do
> cat try.fa | perl ..
> rather than less,IMO. less formats things.
> 
> You can add
>   no warnings qw/once/;
> to get rid of that warning, it shouldn't affect
> the program.
> 
> MAJ
> 
> 
> On Sat, Jul 12, 2014 at 8:03 PM, Haiyan Lin <linhy0120 at gmail.com>
> wrote:
> 
>         
>         Hi, Paul,
>         
>         Thanks for your quick replay and advice. Soryy for my late
>         feedback, I can't send maill in my location yesterday. I have
>         tried your method. But
>         it complains that
>         
>         ------------------------------------
>         [linhy at bioinfo1 Script]$ less try.fa | perl ./fastaLen.pl
>         Name "main::DATA" used only once: possible typo
>         at ./fastaLen.pl line
>         42.
>         Use of uninitialized value in print at ./fastaLen.pl line 49.
>         ------------------------------------
>         
>         
>         I had also tried the "-fh=>\*STDIN", and "-fh=><>",according
>         to the
>         documentation returned by "perldoc Bio::SeqIO" at termminal,
>         but I still
>         haven't got I want.Thanks again.
>         
>         ... ...
>         
>            Bio::SeqIO->new()
>                   $seqIO = Bio::SeqIO->new(-file => 'filename',
>         -format=>$format);
>                   $seqIO = Bio::SeqIO->new(-fh   => \*FILEHANDLE,
>         -format=>$format);
>                   $seqIO = Bio::SeqIO->new(-format => $format);
>         
>                ... ... 
>         
>                -fh  You may provide new() with a previously-opened
>         filehandle.
>         For example, to read from STDIN:
>         
>                        $seqIO = Bio::SeqIO->new(-fh => \*STDIN);
>         
>                     Note that you must pass filehandles as references
>         to globs.
>         
>                     If neither a filehandle nor a filename is
>         specified, then
>         the module will read from the @ARGV array or STDIN, using the
>                     familiar <> semantics.
>         
>         -------------------------------
>         
>         
>         
>         Regards
>         
>         Haiyan
>         
>         
>         
>         
>         On Sat, 2014-07-12 at 10:22 -0400, Paul Cantalupo wrote:
>         > Hi Haiyan,
>         > 
>         > 
>         > You need to use the '-fh' option in Bio::SeqIO new and have
>         it use
>         > Perl's DATA filehandle like so:
>         > my $in = Bio::SeqIO->new(-fh => \*DATA, -format => 'fasta');
>         > 
>         > This was taken from
>         > http://perldoc.perl.org/perldata.html#Special-Literals:
>         > "Text after __DATA__ may be read via the filehandle
>         PACKNAME::DATA ,
>         > where PACKNAME is the package that was current when the
>         __DATA__ token
>         > was encountered."
>         > 
>         > 
>         > 
>         > Good luck,
>         > 
>         > Paul
>         > 
>         > 
>         > 
>         > 
>         > Paul Cantalupo
>         > University of Pittsburgh
>         > 
>         > 
>         > 
>         > On Sat, Jul 12, 2014 at 8:35 AM, Haiyan Lin
>         <linhy0120 at gmail.com>
>         > wrote:
>         >         Hill, dear perlers,
>         >         
>         >         I‘m trying to use Bio::SeqIO to read Fasta sequence
>         from pipe,
>         >         or @ARGV,
>         >         like "while (<>) {....}". After several trier and
>         error,  I'm
>         >         failed and
>         >         need to ask for herp from you. Could you please help
>         me to
>         >         check or try
>         >         following code?
>         >         
>         >         Thanks in advance.
>         >         
>         >         ---------------------------------------
>         >         use Bio::SeqIO ;
>         >         use Statistics::Descriptive ;
>         >         
>         >         my %opt = () ;
>         >         my $sta = Statistics::Descriptive::Full->new();
>         >         
>         >         ##### here is the key, I think.
>         >         my $in = Bio::SeqIO->new(-format=>"Fasta");
>         >         while(my $s = $in->next_seq()){
>         >             $sta->add_data($s->length()) ;
>         >         }
>         >         print $sta->sum() if $opt{sum} ;
>         >         
>         >         __DATA__
>         >         >ct1
>         >         AGAGAGAGA
>         >         >ctg2
>         >         ATATATAT
>         >         -----------------------------------------------
>         >         
>         >         Regards
>         >         
>         >         Haiyan
>         >         
>         >         
>         >         
>         >         
>         >         
>         >         
>         >         _______________________________________________
>         >         Bioperl-l mailing list
>         >         Bioperl-l at mailman.open-bio.org
>         >
>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>         > 
>         > 
>         
>         
>         
>         _______________________________________________
>         Bioperl-l mailing list
>         Bioperl-l at mailman.open-bio.org
>         http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>         




More information about the Bioperl-l mailing list