[Bioperl-l] loading yeast data failing...

Hilmar Lapp hlapp at gmx.net
Tue Jan 3 20:47:51 EST 2006


On 1/3/06, Angshu Kar <angshu96 at gmail.com> wrote:
> Hi Hilmar,
>
> On what basis should I parse? I found the following 3 entries (arbitrary) in
> the bioentry table. The same 3 entries all went to each of the name,
> identifier and accession fields!And the version field contains all 0s!
>
>
> gi|51013395|gb|AAT92991.1|
> gi|732941|emb|CAA54130.1|
>  gi|6321883|ref|NP_011959.1|
>
> So, here for record 1: gi|51013395 is the identifier, AAT92991 is the
> accession number, 1 is the version. Am I right? And then what is the name?

I'd only used 51013395 as the identifier. Other than that: correct.
There is no name in the above examples, either because the entry
doesn't have one designated, or because the tool that wrote the FASTA
file didn't put it into the identifier part. FASTA format doesn't
define these things. Have you checked the description whether there is
a name somewhere? If there isn't one, I'd default name to accession
number.

>
> Also I found out just the following entry in the 3 same fields in the same
> table:
>
>  AT1G08520.1
>
> I'm not getting this!I used the TAIR6 dataset.How to parse this data?
> Could you please advise on how to resolve this?

I have no idea about the TAIR6 datasets - why don't you ask the people
who create those files?

  -hilmar

>
> Thanks,
> Angshu
>
>
>
> On 1/3/06, Hilmar Lapp < hlapp at gmx.net> wrote:
> > You could do that but first that puts you out of sync with the
> > official schema, and second if you look at the value it isn't really
> > an accession number anyway that's causing the problem but rather a
> > concatenation of identifiers, accession numbers, and namespace
> > acronyms. Since you're using a custom SeqProcessor anyway already why
> > don't you just add a line or two of code that parses the display_id
> > value into the accession and identifier? (for instance, the token
> > between two '|' characters following the token 'gb')
> >
> >    -hilmar
> >
> > On 1/3/06, Angshu Kar < angshu96 at gmail.com> wrote:
> > > Hi,
> > >
> > > Could you please help me resolve the follwoing error?
> > >
> > > I run:
> > >
> > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta
> > > --driver=Pg --pipeline="SeqProcessor::Accession"
> yeast_nrpep.fasta
> > >
> > > The error:
> > >
> > > Loading yeast_nrpep.fasta ...
> > >
> > > -------------------- WARNING ---------------------
> > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were
> > >
> ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown
> > > [Saccharomyces cerevisiae]","0","") FKs (19,<NULL>)
> > > ERROR:  value too long for type character varying(40)
> > > ---------------------------------------------------
> > > Could not store
> gi|4261605|gb|AAD13905.1|S58126_11111111111111:
> > > ------------- EXCEPTION  -------------
> > > MSG: error while executing statement in
> > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR:
>  current transaction
> > > is aborted, commands ignored until end of transaction block
> > > STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951
> > > STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
> > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
> > > STACK Bio::DB::Persistent::PersistentObject::store
> > >
> /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272
> > > STACK (eval) ./load_seqdatabase.pl:621
> > > STACK toplevel ./load_seqdatabase.pl:604
> > >
> > > --------------------------------------
> > >
> > >  at ./load_seqdatabase.pl line 634
> > >
> > > Should I change the field lengths for accession, name and identifier to
> some
> > > value >40 in the bioentry table?  What  should I change it to?
> > >
> > > Thanks,
> > > Angshu
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> >
> >
> > --
> >
> ----------------------------------------------------------
> > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> >
> ----------------------------------------------------------
> >
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



More information about the Bioperl-l mailing list