[Bioperl-l] question about Bio::DB::GenBank

Alan Robinson alan@ebi.ac.uk
Tue, 12 Jun 2001 22:10:27 +0100 (GMT Daylight Time)


> Now we should be able to use the EMBL server which provides a very easy to
> query interface based on accession number.  For your purposes this
> requires just using a different DB module (isn't OO programming nice...)
> 
> #!/usr/local/bin/perl
> use Bio::DB::EMBL;
> $gb = new Bio::DB::EMBL();
> my($id) = "AE004439";
> $seq = $gb->get_Seq_by_id($id);
> print $seq->seq ."\n";
> 
> This unfortunately only returns a sequence 12376 bases long, while the
> genbank record appears to be 2257487 bases long.  Hmm, I'm stuck and can't
> dive in more deeply at this point.

OK - I'm doing a first dive into this. The sequence being returned is
AE006034, which has AE004439 as a secondary identifier, and is the 1st of
204 sections of your complete genome (each of which have AE004439 as a
secondary identifier), up to and including AE006237.

The accession AE004439 is in neither the EMBL or DDBJ database.

The EMBL servers should probably only be sent primary accession numbers
and identifiers. Here the database lookup is mapping this secondary
identifier to its best guess.

c.f. http://www.ebi.ac.uk/cgi-bin/emblfetch?AE004439

So the server is behaving as expected, even if it's not what everybody
else is expecting :)

I'll check with the EMBL group tomorrow if they've got a point of view on
this situation.


--
============================================================
Alan J. Robinson, D.Phil.             Tel:+44-(0)1223 494444
European Bioinformatics Institute     Fax:+44-(0)1223 494468
EMBL Outstation - Hinxton             Email:  alan@ebi.ac.uk
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, UK                http://industry.ebi.ac.uk/~alan/
============================================================