[Bioperl-l] loading yeast data failing...

Hilmar Lapp hlapp at gmx.net
Tue Jan 3 21:35:54 EST 2006


There is no better thing to learn something than trying it out and
relentlessly making mistakes. That's what my son's doing right now:
he's pulling himself up and falling around all the time but never gets
discouraged, is brazingly overconfident, and for sure some day he'll
walk - without me having instructed him a second.

You did the same thing some time ago - why do you now worry about confidence?

On 1/3/06, Angshu Kar <angshu96 at gmail.com> wrote:
> I'll try that out Hilmar. And thanks for the clue. :)
> Scent a good mentor in you. :)
>
> Thanks again,
> Angshu
>
> PS: And no one forbid me but being a tyro I'm not feeling much confident to
> fiddle with the real data!
>
>
>
> On 1/3/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> > I suggest you read the SeqIO HOWTO and have a look at the FASTA format
> > definition (try Google - it's your friend).
> >
> > Hint: you're answering your own question. Did someone forbid you to
> > play around and use the debugger (or simple print statements for that
> > matter)?
> >
> > On 1/3/06, Angshu Kar <angshu96 at gmail.com> wrote:
> > > Thanks Hilmar.
> > > Now I've another query:
> > >
> > > Here is the accessor.pm I'm using (one written by Marc):
> > >
> > > use strict;
> > > use vars qw(@ISA);
> > >  use lib '/home/akar/local/perl/';
> > > use Bio::Seq::BaseSeqProcessor;
> > > use Bio::SeqFeature::Generic;
> > >
> > > @ISA = qw(Bio::Seq::BaseSeqProcessor);
> > >
> > >  sub process_seq
> > > {
> > >   my ($self, $seq) = @_;
> > >    $seq->accession_number($seq->display_id);
> > >   return ($seq);
> > >  }
> > >
> > > Could you please let me know what is display_id here? Also which
> variable
> > > contains the "gi|51013395|gb|AAT92991.1|" string?
> > >
> > >
> > > Thanks,
> > > Angshu
> > >
> > >
> > > On 1/3/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> > > > On 1/3/06, Angshu Kar <angshu96 at gmail.com> wrote:
> > > > > Hi Hilmar,
> > > > >
> > > > > On what basis should I parse? I found the following 3 entries
> > > (arbitrary) in
> > > > > the bioentry table. The same 3 entries all went to each of the name,
> > > > > identifier and accession fields!And the version field contains all
> 0s!
> > > > >
> > > > >
> > > > > gi|51013395|gb|AAT92991.1|
> > > > > gi|732941|emb|CAA54130.1|
> > > > >  gi|6321883|ref|NP_011959.1|
> > > > >
> > > > > So, here for record 1: gi|51013395 is the identifier, AAT92991 is
> the
> > > > > accession number, 1 is the version. Am I right? And then what is the
> > > name?
> > > >
> > > > I'd only used 51013395 as the identifier. Other than that: correct.
> > > > There is no name in the above examples, either because the entry
> > > > doesn't have one designated, or because the tool that wrote the FASTA
> > > > file didn't put it into the identifier part. FASTA format doesn't
> > > > define these things. Have you checked the description whether there is
> > > > a name somewhere? If there isn't one, I'd default name to accession
> > > > number.
> > > >
> > > > >
> > > > > Also I found out just the following entry in the 3 same fields in
> the
> > > same
> > > > > table:
> > > > >
> > > > >  AT1G08520.1
> > > > >
> > > > > I'm not getting this!I used the TAIR6 dataset.How to parse this
> data?
> > > > > Could you please advise on how to resolve this?
> > > >
> > > > I have no idea about the TAIR6 datasets - why don't you ask the people
> > > > who create those files?
> > > >
> > > >   -hilmar
> > > >
> > > > >
> > > > > Thanks,
> > > > > Angshu
> > > > >
> > > > >
> > > > >
> > > > > On 1/3/06, Hilmar Lapp < hlapp at gmx.net> wrote:
> > > > > > You could do that but first that puts you out of sync with the
> > > > > > official schema, and second if you look at the value it isn't
> really
> > > > > > an accession number anyway that's causing the problem but rather a
> > > > > > concatenation of identifiers, accession numbers, and namespace
> > > > > > acronyms. Since you're using a custom SeqProcessor anyway already
> why
> > > > > > don't you just add a line or two of code that parses the
> display_id
> > > > > > value into the accession and identifier? (for instance, the token
> > > > > > between two '|' characters following the token 'gb')
> > > > > >
> > > > > >    -hilmar
> > > > > >
> > > > > > On 1/3/06, Angshu Kar < angshu96 at gmail.com> wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Could you please help me resolve the follwoing error?
> > > > > > >
> > > > > > > I run:
> > > > > > >
> > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres
> --format=fasta
> > > > > > > --driver=Pg
> --pipeline="SeqProcessor::Accession"
> > > > > yeast_nrpep.fasta
> > > > > > >
> > > > > > > The error:
> > > > > > >
> > > > > > > Loading yeast_nrpep.fasta ...
> > > > > > >
> > > > > > > -------------------- WARNING ---------------------
> > > > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed,
> values
> > > were
> > > > > > >
> > > > >
> > >
> ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown
> > > > > > > [Saccharomyces cerevisiae]","0","") FKs (19,<NULL>)
> > > > > > > ERROR:  value too long for type character varying(40)
> > > > > > >
> ---------------------------------------------------
> > > > > > > Could not store
> > > > > gi|4261605|gb|AAD13905.1|S58126_11111111111111:
> > > > > > > ------------- EXCEPTION  -------------
> > > > > > > MSG: error while executing statement in
> > > > > > >
> Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key:
> > > ERROR:
> > > > >  current transaction
> > > > > > > is aborted, commands ignored until end of transaction block
> > > > > > > STACK
> > > > >
> > >
> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> > > > > > >
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951
> > > > > > > STACK
> > > > >
> > >
> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> > > > > > >
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> > > > > > > STACK
> > > Bio::DB::BioSQL::BasePersistenceAdaptor::create
> > > > > > >
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
> > > > > > > STACK
> > > Bio::DB::BioSQL::BasePersistenceAdaptor::store
> > > > > > >
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
> > > > > > > STACK
> Bio::DB::Persistent::PersistentObject::store
> > > > > > >
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272
> > > > > > > STACK (eval) ./load_seqdatabase.pl:621
> > > > > > > STACK toplevel ./load_seqdatabase.pl:604
> > > > > > >
> > > > > > > --------------------------------------
> > > > > > >
> > > > > > >  at ./load_seqdatabase.pl line 634
> > > > > > >
> > > > > > > Should I change the field lengths for accession, name and
> identifier
> > > to
> > > > > some
> > > > > > > value >40 in the bioentry table?  What  should I change it to?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Angshu
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at portal.open-bio.org
> > > > > > >
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > >
> > >
> ----------------------------------------------------------
> > > > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> > > > > >
> > > > >
> > >
> ----------------------------------------------------------
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > >
> ----------------------------------------------------------
> > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> > > >
> > >
> ----------------------------------------------------------
> > > >
> > >
> > >
> >
> >
> > --
> >
> ----------------------------------------------------------
> > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> >
> ----------------------------------------------------------
> >
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



More information about the Bioperl-l mailing list