[Bioperl-l] Re: AF165282 and Bio::DB::GenBank

Jason Stajich jason@chg.mc.duke.edu
Tue, 24 Apr 2001 17:33:14 -0400 (EDT)


Anton, thank you for you request to look into this.  Please submit these
type of things as a bug on the bioperl bug submission form in the future.
I have taken care of this for this case.

It appears that the problem is not in the db download (Bio::DB::GenBank),
but in the genbank parsing, but you may need to clarify what you mean by
'it does not work' as I can certainly download the sequence and get at
least sequence information just not all the features are being parsed
correctly.

The offending features look somthing like this which the regexps aren't
handling I'm guessing.

    gene            join(<1..226,AF165283.1:1..197,AF165284.1:1..243,
                     AF165285.1:1..242,AF165286.1:1..225,AF165287.1:1..152,
                     AF165288.1:1..163,AF165289.1:1..158,AF165290.1:1..241,
                     AF165291.1:1..93,AF165292.1:1..223,AF165293.1:1..69,
                     AF165294.1:1..134,AF165295.1:1..169,AF165296.1:1..145,
                     AF165297.1:1..119,AF165298.1:1..209,AF165299.1:1..115,
                     AF165300.1:1..53,AF165301.1:1..126,AF165302.1:1..95,
                     AF165303.1:1..190,AF165304.1:1..198,AF165305.1:1..136,
                     AF165306.1:1..165,AF165307.1:1..150,AF165308.1:1..141,
                     AF165309.1:1..83,AF165310.1:1..>264)
                     /gene="ABC1"

556 magrathea tests $ cat genbank_dbtests.t
#!/usr/local/bin/perl -w
use strict;
use Bio::DB::GenBank;
use Bio::SeqIO;
my $db = new Bio::DB::GenBank;
my $seq = $db->get_Seq_by_acc('AF165282');
my $seqout = new Bio::SeqIO(-format => 'genbank',
                            -fh     => \*STDOUT);
$seqout->write_seq($seq);

557 magrathea tests $ perl genbank_dbtests.t
-------------------- WARNING ---------------------
MSG: unable to parse location successfully out of AF165283.1:1..197,
ignoring feature (seqid=HSATPCB01)
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse feature gene in EMBL/GenBank/SwissProt sequence entry
(id=HSATPCB01), ignoring
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse location successfully out of AF165283.1:16..192,
ignoring feature (seqid=HSATPCB01)
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse feature mRNA in EMBL/GenBank/SwissProt sequence entry
(id=HSATPCB01), ignoring
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse location successfully out of AF165283.1:16..192,
ignoring feature (seqid=HSATPCB01)
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: unable to parse feature CDS in EMBL/GenBank/SwissProt sequence entry
(id=HSATPCB01), ignoring
---------------------------------------------------
LOCUS       HSATPCB01        226 bp     DNA  PRI 17-AUG-1999
DEFINITION  Homo sapiens ATP cassette binding transporter 1 (ABC1) gene,
exon
            12.
ACCESSION   AF165282
VERSION     AF165282.1 GI:5734104
KEYWORDS    .
SOURCE      Homo sapiens.
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 226)
  AUTHORS   Rust S., Rosier M., Funke H., Real J., Amoura Z., Piette J.C.
            Deleuze J.F., Brewer H.B., Duverger N., Denefle P. and Assmann
G.
  TITLE     Tangier disease is caused by mutations in the gene encoding
            ATP-binding cassette transporter 1
  JOURNAL   Nat. Genet. 22 (4), 352-355 (1999)
REFERENCE   2  (bases 1 to 226)
  AUTHORS   Rust S., Rosier M., Funke H., Real J., Amoura Z., Piette J.C.
            Deleuze J.F., Brewer H.B., Duverger N., Denefle P. and Assmann
G.
  TITLE     Direct Submission
  JOURNAL   Submitted (06-JUL-1999) Genomics, Rhone-Poulenc Rorer, 2 rue
            GastonCremieux, Evry 91006, France
FEATURES             Location/Qualifiers
     source          1..226
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
                     /chromosome=9
                     /map="9q31"
     exon            16..221
                     /number=12
                     /gene="ABC1"
BASE COUNT       69 a     46 c     58 g     53 t
ORIGIN
        1 ctgttcttct atcagtgtgt caacctgaac aagctagaac ccatagcaac agaagtctgg
       61 ctcatcaaca agtccatgga gctgctggat gagaggaagt tctgggctgg tattgtgttc
      121 actggaatta ctccaggcag cattgagctg ccccatcatg tcaagtacaa gatccgaatg
      181 gacattgaca atgtggagag gacaaataaa atcaaggatg ggtaag
//

-Jason
On Tue, 24 Apr 2001, Anton Nekrutenko wrote:

> Dear Aaron and Jason,
>
> It seem like there is bug in the Bio::DB::GenBank
>
> The following string does not work:
>
> $seq = $gb->get_Seq_by_acc('AF165282');
>
> There seems to be something magical about this particular accession
> number.
>
> Thanks for your kind help and time.
>
> Anton
>
> --
> -----------------------------------
> Anton Nekrutenko, Ph. D.
> Department of Ecology and Evolution
> The University of Chicago
> anton@nekrut.uchicago.edu
> http://nekrut.uchicago.edu
> (773) 834-3965
> (773) 702-9740 (fax)
> -----------------------------------
>
>
>
>

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center
http://www.chg.duke.edu/