[Bioperl-l] Indexing GenBank files

Brad Chapman Brad Chapman <chapmanb@arches.uga.edu>
20 Sep 2000 13:06:22 EDT


Hey all;
	 I have been using Bio::Index::GenBank off of the main trunk in
CVS to index GenBank files so that I can use them as SeqDB objects for
Biocorba stuff. Using this led me to a couple of questions about the
module:

1. What is the proper way to create Index objects? For some of the
Indexers (ie. Fasta.pm) the docs say to write a new index like:

my $index = Bio::Index::Fasta->new( -filename => $the_new_index_file,
						    -write_flag => 1);

While others, like GenBank, appear to want it like this:

my $index = Bio::Index::GenBank->new($the_new_index_file, 'WRITE');

It appears as if Abstract.pm, the base class, likes 'em the first way, so
to use GenBank.pm I had to make a little fix to be able to do it the
first way, but I'm not exactly sure what the right thing to do here
is....

2. Right now GenBank.pm creates the index keys using the identifier after
the LOCUS keyword. Personally, I like using the accession number better
(ie. the first thing in all of the junk after ACCESSION) and made a
little hack to do this. It seems to work okay for what I've been doing
thus far, but I'm not sure if this is a good idea and there is a reason
why using the locus identifier is better. I am definately not the most
experienced person ever when it comes to dealing with GenBank files, so
I'd really like to hear what people's opinion on this is. 

Brad