[Bioperl-l] Each_DBLink : only returning 1 entry ?

Jean-Jack M. Riethoven pow@ebi.ac.uk
Thu, 30 Aug 2001 16:08:15 +0100


> > that tells me this sequence (or acc_number) is in fact part of EMBL
> > (Genbank/DDBJ), or SWISS_PROT, or...
>
> $dblink->database()

*chuckle*

I know $dblink->database() but it returns the database name of a
crossreference (the DR stuff in EMBL flatfiles).

What I need is the database code of the main seq object:

$seq->database()

In my example, X02158 is the accession_number, HSERPG the display_id.
DBLink only lists some GDB and SWISS-PROT stuff in this example.

Normally I would obviously know if I get this from EMBL, Genbank, DDBJ, or
something else. In my script I'd like to circumvent guessing work though -
I know the main database code is not in the EMBL flatfile itself. Would it
be possible to have a stub function that returns the input format (from
Bio:SeqIO) as textstring?

$in  = Bio::SeqIO->new('-file' => "$file_in",
                         '-format' => 'EMBL');
my $seq = $in->next_seq()
print "Accession number" . $seq->accession_number . " is from database " .
$seq->database();

I know I can pass a var with the EMBL string used in SeqIO->new to my
package  - but I try to keep things as much as possible 'clean'. It seems
sensible to have a method like this.

No hurry though - I can wing it with some hard-coding until then.

JJ