Re: [Bioperl-l] Bio::Index::Fasta return object problem

Jason Stajich jason at cgt.mc.duke.edu
Mon Apr 28 11:32:55 EDT 2003


See the id is
'sptrembl|Q9V2U7|Q9V2U7' not 'Q9V2U7' by the default indexer method.

If you want to index on the individual accessions you need to define a new
id_parser function to deal with this (v simple, see below).

# Complete code for making an index for several
# fasta files
use lib '/usr/local/psu/bioperl-live';
use Bio::Index::Fasta;
use strict;

my ($Index_File_Name,$Fasta_File) = @ARGV;
my $inx = Bio::Index::Fasta->new(
'-filename' => $Index_File_Name,
'-write_flag' => 1);

$inx->id_parser(\&my_id_parser);
$inx->make_index($Fasta_File);

sub my_id_parser {
    my @retvals;
    if( $_[0] =~ /^>(\S+)/ ) {
	my ($src, at accs) = split(/\|/,$1);
        push @retvals, @accs;
    }
   return @retvals;
}

-jason

On Mon, 28 Apr 2003, [iso-8859-1] david vilanova wrote:

>
> Here are two of the sequences I try to retrieve from the indexed file.
> >sptrembl|Q9V2U7|Q9V2U7 ABC transporter permease (Fragment).
> MLYVGLPYMVNSGRDGFLMVNEELENVSRTLGASKIKTFFNISLPLIKNNLVSGSILTFA
> RGISEVGAILIVAYFPKTTPVLILDRFNEYGLSSSKPISVIMIVLSIVLFSVFRLVRYNK
> K
> >sptrembl|Q977F6|Q977F6 FtsY.
> MALLQSDVEMQVAEEILETIREKLIGETRKQVESTGQLVSEALHDALYEVISVGQFDFDQ
> RIAEADKPVTLIFTGINGVGKTTTIAKLAKYFEKQGYSTVLANGDTYRAGANEQIREHAE
> ALDKKLIAHEQGGDPAAVIYDGVEYAEAHDIDIVLGDTAGRLHTSNDLMAQLEKIDRVVG
> PDLTLFVDEAVAGQDAVERARQFNDAAAIDGAILTKADADSNGGAAISIAYVTGKPILFL
> GVGQGYDHIEKFDPEQMVERLLGEDE
> I use:
> my @ids = qw(Q9V2U7 Q977F6);
> foreach my $id (@ids) {
> my $seq = $inx->fetch($id); # Returns Bio::Seq object
> $out->write_seq($seq);
> }
>
> It still gives the exception.
>
>
>
> > Bio::Seq ISA Bio::PrimarySeqI
> >
> > You might want some code to verify that $seq is defined - it is
> > possible you aren't actually retrieving anything when you call fetch.
> > Just do a grep "accession" filename
> >
> > If the accession doesn't match exactly you need to write your own
> > id_parser function which will (For example) drop the .1 or some thing.
> >
> > Also you really don't want to be calling
> > for $id ($Index_File_Name) {
> > you want to be calling
> > for $id ( @accession ) {
> > }
> >
> > where @accession is the list of accession numbers or unique Ids you
> > want to retrieve with
> >
> >
> >
> > On Mon, 28 Apr 2003, [iso-8859-1] david vilanova wrote:
> >
> >> Dear all,
> >> I have some problems when I try to retreive sequences from an indexed
> >> file. The Index::Fasta  seems to return a Bio::Seq object however the
> >> method write_seq requires a valid Bio::PrimarySeqI object. How can I
> >> can go through this problem.
> >>
> >> Thanks
> >> ------------- EXCEPTION  -------------
> >> MSG: Did not provide a valid Bio::PrimarySeqI object
> >> STACK Bio::SeqIO::fasta::write_seq
> >> /usr/local/psu/bioperl-live/Bio/SeqIO/fasta.pm:175
> >> STACK toplevel fasta.pl:21
> >>
> >>
> >> use Bio::Index::Fasta;
> >> use strict;
> >>
> >> my ($Index_File_Name,$Fasta_File) = @ARGV;
> >> my $inx = Bio::Index::Fasta->new(
> >> '-filename' => $Index_File_Name,
> >> '-write_flag' => 1);
> >> $inx->make_index($Fasta_File);
> >>
> >> # Print out several sequences present in the index
> >> # in Fasta format
> >>
> >> my $out = Bio::SeqIO->new('-format' => 'Fasta','-fh' => \*STDOUT);
> >>
> >> foreach my $id ($Index_File_Name) {
> >> my $seq = $inx->fetch($id); # Returns Bio::Seq object
> >> $out->write_seq($seq);#requires a valid Bio::PrimarySeqI object
> >> }
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at bioperl.org
> >> http://bioperl.org/mailman/listinfo/bioperl-l
> >>
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
>
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list