[Bioperl-l] load_seqdatabase error with a specific locus from genbank
Hilmar Lapp
hlapp at gmx.net
Mon Apr 6 15:39:50 UTC 2009
(Removing biosql-l from the cc list as this seems to be a problem with
BioPerl.)
Hi Johann,
I don't know whether anyone has responded to you yet - if not I'm
sorry, I've been inundated for the past couple test.
On Apr 1, 2009, at 6:14 AM, Johann PELLET wrote:
> With the latest version of BioPerl and BioSQL, I have tried to
> insert entry from a GenBank file, which I have downloaded from the
> NCBI website (648 937 records)
Could you be more specific? When you say the latest version of
BioPerl, do you mean 1.6.1 or the current svn snapshot of the main
trunk?
And which Genbank file is it? Is it one with only viruses, i.e., are
you specifically interested in the virus sequences that the parser is
giving you trouble with?
> After successfully loading ncbi_taxonomy i am getting following
> error message while loading sequences into database.
>
> perl load_seqdatabase.pl gb_03-2009 -format genbank -driver Pg -
> dbname biosql
>
>
> --------------------- WARNING ---------------------
> MSG: The supplied lineage does not start near 'Human papillomavirus
> type 2c' (I was supplied 'Human papillomavirus - 2 |
> Alphapapillomavirus | Papillomaviridae')
This is a problem in the BioPerl genbank parser, or more specifically,
in the species parser.
I thought though this was fixed in 1.6.1; are you sure you don't have
an older version of BioPerl lying around that could accidentally have
been used?
That said, it only seems to be a warning; did you check how the record
ended up in the database and found it to be incomplete or messed up?
> the script is not stopped until this entry: S67864
This a later entry, not the same entry that causes the problem above,
right?
> --------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::LocationAdaptor (driver) failed,
> values were ("1","19)","1","3") FKs (41914,<NULL>)
> ERROR: invalid input syntax for integer: "19)"
Oops - that's a problem that must originate from the BioPerl feature
location parser.
The full record is here: http://www.ncbi.nlm.nih.gov/nuccore/544772
Does anyone see why the location parser should have a problem with the
first gene feature? It's nested, and has remote location components,
but at first sight nothing jumps out at me as extraordinary. Has
someone recently changed the location parsing code? If no-one has an
immediate idea what could be at work here, this needs investigating.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list