[Bioperl-l] Indexing CDS file

Sviya sviiya at gmail.com
Tue Feb 10 02:28:34 EST 2009


Hello,I am trying to index cds files. The following script is run on
cds_ann_hum_01_r99.dat obtained from EMBL. It is able to read and print the
ids and sequences, but strangely it is unable to find any of the ids! Could
you please explain the mistake I am committing? I have tried variously but
it doesn't work.

--- <prog> ---

#!/usr/bin/perl -w

        use strict;

        use Bio::SeqIO;
        use Bio::Index::Fasta;

        my $file = "cds100.dat";
        my $idx_file = "cds100.idx";

        unlink($idx_file); # delete the file, if exists

        my $in_seq = Bio::SeqIO->new(   -file => "< $file",
                                        -format => "embl");

        while ( my $seq = $in_seq->next_seq) {
                print $seq->id, "\n";
        }

        my $inx = Bio::Index::Fasta->new(       -filename => $idx_file,
                                                -write_flag => 1);

        $inx->make_index($file); # creates the index file

        print "Done indexing!\n";

        my $out = Bio::SeqIO->new( '-format' => 'fasta');

        $inx = Bio::Index::Fasta->new('-filename' => $idx_file);

        my $query_seq=$inx->fetch('EAL24405SV');

        if ( ! defined($query_seq) ) {
                print "Sequence not found!\n";
                exit 1;
        } else {
                print $query_seq;
        }

exit 0;

--- </prog> ---

Output is:
--- <stdout> ---
EAL24405;
EAL24406;
.
.
.
EAL24407;
EAL24408;
Done indexing!
Sequence not found!
--- </stdout> ---

TIA,
Sviya


More information about the Bioperl-l mailing list