From angshu96 at gmail.com Tue Jan 3 15:37:03 2006 From: angshu96 at gmail.com (Angshu Kar) Date: Tue Jan 3 15:39:47 2006 Subject: [Bioperl-l] loading yeast data failing... Message-ID: Hi, Could you please help me resolve the follwoing error? I run: ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta --driver=Pg --pipeline="SeqProcessor::Accession" yeast_nrpep.fasta The error: Loading yeast_nrpep.fasta ... -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown [Saccharomyces cerevisiae]","0","") FKs (19,) ERROR: value too long for type character varying(40) --------------------------------------------------- Could not store gi|4261605|gb|AAD13905.1|S58126_11111111111111: ------------- EXCEPTION ------------- MSG: error while executing statement in Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current transaction is aborted, commands ignored until end of transaction block STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 STACK Bio::DB::Persistent::PersistentObject::store /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 STACK (eval) ./load_seqdatabase.pl:621 STACK toplevel ./load_seqdatabase.pl:604 -------------------------------------- at ./load_seqdatabase.pl line 634 Should I change the field lengths for accession, name and identifier to some value >40 in the bioentry table? What should I change it to? Thanks, Angshu From hlapp at gmx.net Tue Jan 3 16:17:32 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Jan 3 16:20:45 2006 Subject: [Bioperl-l] loading yeast data failing... In-Reply-To: References: Message-ID: You could do that but first that puts you out of sync with the official schema, and second if you look at the value it isn't really an accession number anyway that's causing the problem but rather a concatenation of identifiers, accession numbers, and namespace acronyms. Since you're using a custom SeqProcessor anyway already why don't you just add a line or two of code that parses the display_id value into the accession and identifier? (for instance, the token between two '|' characters following the token 'gb') -hilmar On 1/3/06, Angshu Kar wrote: > Hi, > > Could you please help me resolve the follwoing error? > > I run: > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta > --driver=Pg --pipeline="SeqProcessor::Accession" yeast_nrpep.fasta > > The error: > > Loading yeast_nrpep.fasta ... > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown > [Saccharomyces cerevisiae]","0","") FKs (19,) > ERROR: value too long for type character varying(40) > --------------------------------------------------- > Could not store gi|4261605|gb|AAD13905.1|S58126_11111111111111: > ------------- EXCEPTION ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current transaction > is aborted, commands ignored until end of transaction block > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > STACK Bio::DB::Persistent::PersistentObject::store > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > STACK (eval) ./load_seqdatabase.pl:621 > STACK toplevel ./load_seqdatabase.pl:604 > > -------------------------------------- > > at ./load_seqdatabase.pl line 634 > > Should I change the field lengths for accession, name and identifier to some > value >40 in the bioentry table? What should I change it to? > > Thanks, > Angshu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From angshu96 at gmail.com Tue Jan 3 20:41:21 2006 From: angshu96 at gmail.com (Angshu Kar) Date: Tue Jan 3 20:38:02 2006 Subject: [Bioperl-l] loading yeast data failing... In-Reply-To: References: Message-ID: Hi Hilmar, On what basis should I parse? I found the following 3 entries (arbitrary) in the bioentry table. The same 3 entries all went to each of the name, identifier and accession fields!And the version field contains all 0s! gi|51013395|gb|AAT92991.1| gi|732941|emb|CAA54130.1| gi|6321883|ref|NP_011959.1| So, here for record 1: gi|51013395 is the identifier, AAT92991 is the accession number, 1 is the version. Am I right? And then what is the name? Also I found out just the following entry in the 3 same fields in the same table: AT1G08520.1 I'm not getting this!I used the TAIR6 dataset.How to parse this data? Could you please advise on how to resolve this? Thanks, Angshu On 1/3/06, Hilmar Lapp wrote: > You could do that but first that puts you out of sync with the > official schema, and second if you look at the value it isn't really > an accession number anyway that's causing the problem but rather a > concatenation of identifiers, accession numbers, and namespace > acronyms. Since you're using a custom SeqProcessor anyway already why > don't you just add a line or two of code that parses the display_id > value into the accession and identifier? (for instance, the token > between two '|' characters following the token 'gb') > > -hilmar > > On 1/3/06, Angshu Kar wrote: > > Hi, > > > > Could you please help me resolve the follwoing error? > > > > I run: > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta > > --driver=Pg --pipeline="SeqProcessor::Accession" yeast_nrpep.fasta > > > > The error: > > > > Loading yeast_nrpep.fasta ... > > > > -------------------- WARNING --------------------- > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > ERROR: value too long for type character varying(40) > > --------------------------------------------------- > > Could not store gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > ------------- EXCEPTION ------------- > > MSG: error while executing statement in > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current transaction > > is aborted, commands ignored until end of transaction block > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > STACK Bio::DB::Persistent::PersistentObject::store > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > STACK (eval) ./load_seqdatabase.pl:621 > > STACK toplevel ./load_seqdatabase.pl:604 > > > > -------------------------------------- > > > > at ./load_seqdatabase.pl line 634 > > > > Should I change the field lengths for accession, name and identifier to some > > value >40 in the bioentry table? What should I change it to? > > > > Thanks, > > Angshu > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > From hlapp at gmx.net Tue Jan 3 20:47:51 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Jan 3 20:52:20 2006 Subject: [Bioperl-l] loading yeast data failing... In-Reply-To: References: Message-ID: On 1/3/06, Angshu Kar wrote: > Hi Hilmar, > > On what basis should I parse? I found the following 3 entries (arbitrary) in > the bioentry table. The same 3 entries all went to each of the name, > identifier and accession fields!And the version field contains all 0s! > > > gi|51013395|gb|AAT92991.1| > gi|732941|emb|CAA54130.1| > gi|6321883|ref|NP_011959.1| > > So, here for record 1: gi|51013395 is the identifier, AAT92991 is the > accession number, 1 is the version. Am I right? And then what is the name? I'd only used 51013395 as the identifier. Other than that: correct. There is no name in the above examples, either because the entry doesn't have one designated, or because the tool that wrote the FASTA file didn't put it into the identifier part. FASTA format doesn't define these things. Have you checked the description whether there is a name somewhere? If there isn't one, I'd default name to accession number. > > Also I found out just the following entry in the 3 same fields in the same > table: > > AT1G08520.1 > > I'm not getting this!I used the TAIR6 dataset.How to parse this data? > Could you please advise on how to resolve this? I have no idea about the TAIR6 datasets - why don't you ask the people who create those files? -hilmar > > Thanks, > Angshu > > > > On 1/3/06, Hilmar Lapp < hlapp@gmx.net> wrote: > > You could do that but first that puts you out of sync with the > > official schema, and second if you look at the value it isn't really > > an accession number anyway that's causing the problem but rather a > > concatenation of identifiers, accession numbers, and namespace > > acronyms. Since you're using a custom SeqProcessor anyway already why > > don't you just add a line or two of code that parses the display_id > > value into the accession and identifier? (for instance, the token > > between two '|' characters following the token 'gb') > > > > -hilmar > > > > On 1/3/06, Angshu Kar < angshu96@gmail.com> wrote: > > > Hi, > > > > > > Could you please help me resolve the follwoing error? > > > > > > I run: > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta > > > --driver=Pg --pipeline="SeqProcessor::Accession" > yeast_nrpep.fasta > > > > > > The error: > > > > > > Loading yeast_nrpep.fasta ... > > > > > > -------------------- WARNING --------------------- > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were > > > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown > > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > > ERROR: value too long for type character varying(40) > > > --------------------------------------------------- > > > Could not store > gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > > ------------- EXCEPTION ------------- > > > MSG: error while executing statement in > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: > current transaction > > > is aborted, commands ignored until end of transaction block > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > > STACK Bio::DB::Persistent::PersistentObject::store > > > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > > STACK (eval) ./load_seqdatabase.pl:621 > > > STACK toplevel ./load_seqdatabase.pl:604 > > > > > > -------------------------------------- > > > > > > at ./load_seqdatabase.pl line 634 > > > > > > Should I change the field lengths for accession, name and identifier to > some > > > value >40 in the bioentry table? What should I change it to? > > > > > > Thanks, > > > Angshu > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > -- > > > ---------------------------------------------------------- > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > ---------------------------------------------------------- > > > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From angshu96 at gmail.com Tue Jan 3 20:56:21 2006 From: angshu96 at gmail.com (Angshu Kar) Date: Tue Jan 3 20:59:07 2006 Subject: [Bioperl-l] loading yeast data failing... In-Reply-To: References: Message-ID: Thanks Hilmar. Now I've another query: Here is the accessor.pm I'm using (one written by Marc): use strict; use vars qw(@ISA); use lib '/home/akar/local/perl/'; use Bio::Seq::BaseSeqProcessor; use Bio::SeqFeature::Generic; @ISA = qw(Bio::Seq::BaseSeqProcessor); sub process_seq { my ($self, $seq) = @_; $seq->accession_number($seq->display_id); return ($seq); } Could you please let me know what is display_id here? Also which variable contains the "gi|51013395|gb|AAT92991.1|" string? Thanks, Angshu On 1/3/06, Hilmar Lapp wrote: > > On 1/3/06, Angshu Kar wrote: > > Hi Hilmar, > > > > On what basis should I parse? I found the following 3 entries > (arbitrary) in > > the bioentry table. The same 3 entries all went to each of the name, > > identifier and accession fields!And the version field contains all 0s! > > > > > > gi|51013395|gb|AAT92991.1| > > gi|732941|emb|CAA54130.1| > > gi|6321883|ref|NP_011959.1| > > > > So, here for record 1: gi|51013395 is the identifier, AAT92991 is the > > accession number, 1 is the version. Am I right? And then what is the > name? > > I'd only used 51013395 as the identifier. Other than that: correct. > There is no name in the above examples, either because the entry > doesn't have one designated, or because the tool that wrote the FASTA > file didn't put it into the identifier part. FASTA format doesn't > define these things. Have you checked the description whether there is > a name somewhere? If there isn't one, I'd default name to accession > number. > > > > > Also I found out just the following entry in the 3 same fields in the > same > > table: > > > > AT1G08520.1 > > > > I'm not getting this!I used the TAIR6 dataset.How to parse this data? > > Could you please advise on how to resolve this? > > I have no idea about the TAIR6 datasets - why don't you ask the people > who create those files? > > -hilmar > > > > > Thanks, > > Angshu > > > > > > > > On 1/3/06, Hilmar Lapp < hlapp@gmx.net> wrote: > > > You could do that but first that puts you out of sync with the > > > official schema, and second if you look at the value it isn't really > > > an accession number anyway that's causing the problem but rather a > > > concatenation of identifiers, accession numbers, and namespace > > > acronyms. Since you're using a custom SeqProcessor anyway already why > > > don't you just add a line or two of code that parses the display_id > > > value into the accession and identifier? (for instance, the token > > > between two '|' characters following the token 'gb') > > > > > > -hilmar > > > > > > On 1/3/06, Angshu Kar < angshu96@gmail.com> wrote: > > > > Hi, > > > > > > > > Could you please help me resolve the follwoing error? > > > > > > > > I run: > > > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta > > > > --driver=Pg --pipeline="SeqProcessor::Accession" > > yeast_nrpep.fasta > > > > > > > > The error: > > > > > > > > Loading yeast_nrpep.fasta ... > > > > > > > > -------------------- WARNING --------------------- > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were > > > > > > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown > > > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > > > ERROR: value too long for type character varying(40) > > > > --------------------------------------------------- > > > > Could not store > > gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > > > ------------- EXCEPTION ------------- > > > > MSG: error while executing statement in > > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: > > current transaction > > > > is aborted, commands ignored until end of transaction block > > > > STACK > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > > > STACK > > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > > > STACK Bio::DB::Persistent::PersistentObject::store > > > > > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > > > STACK (eval) ./load_seqdatabase.pl:621 > > > > STACK toplevel ./load_seqdatabase.pl:604 > > > > > > > > -------------------------------------- > > > > > > > > at ./load_seqdatabase.pl line 634 > > > > > > > > Should I change the field lengths for accession, name and identifier > to > > some > > > > value >40 in the bioentry table? What should I change it to? > > > > > > > > Thanks, > > > > Angshu > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > -- > > > > > ---------------------------------------------------------- > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > > > ---------------------------------------------------------- > > > > > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > From hlapp at gmx.net Tue Jan 3 21:07:54 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Jan 3 21:04:34 2006 Subject: [Bioperl-l] loading yeast data failing... In-Reply-To: References: Message-ID: I suggest you read the SeqIO HOWTO and have a look at the FASTA format definition (try Google - it's your friend). Hint: you're answering your own question. Did someone forbid you to play around and use the debugger (or simple print statements for that matter)? On 1/3/06, Angshu Kar wrote: > Thanks Hilmar. > Now I've another query: > > Here is the accessor.pm I'm using (one written by Marc): > > use strict; > use vars qw(@ISA); > use lib '/home/akar/local/perl/'; > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > sub process_seq > { > my ($self, $seq) = @_; > $seq->accession_number($seq->display_id); > return ($seq); > } > > Could you please let me know what is display_id here? Also which variable > contains the "gi|51013395|gb|AAT92991.1|" string? > > > Thanks, > Angshu > > > On 1/3/06, Hilmar Lapp wrote: > > On 1/3/06, Angshu Kar wrote: > > > Hi Hilmar, > > > > > > On what basis should I parse? I found the following 3 entries > (arbitrary) in > > > the bioentry table. The same 3 entries all went to each of the name, > > > identifier and accession fields!And the version field contains all 0s! > > > > > > > > > gi|51013395|gb|AAT92991.1| > > > gi|732941|emb|CAA54130.1| > > > gi|6321883|ref|NP_011959.1| > > > > > > So, here for record 1: gi|51013395 is the identifier, AAT92991 is the > > > accession number, 1 is the version. Am I right? And then what is the > name? > > > > I'd only used 51013395 as the identifier. Other than that: correct. > > There is no name in the above examples, either because the entry > > doesn't have one designated, or because the tool that wrote the FASTA > > file didn't put it into the identifier part. FASTA format doesn't > > define these things. Have you checked the description whether there is > > a name somewhere? If there isn't one, I'd default name to accession > > number. > > > > > > > > Also I found out just the following entry in the 3 same fields in the > same > > > table: > > > > > > AT1G08520.1 > > > > > > I'm not getting this!I used the TAIR6 dataset.How to parse this data? > > > Could you please advise on how to resolve this? > > > > I have no idea about the TAIR6 datasets - why don't you ask the people > > who create those files? > > > > -hilmar > > > > > > > > Thanks, > > > Angshu > > > > > > > > > > > > On 1/3/06, Hilmar Lapp < hlapp@gmx.net> wrote: > > > > You could do that but first that puts you out of sync with the > > > > official schema, and second if you look at the value it isn't really > > > > an accession number anyway that's causing the problem but rather a > > > > concatenation of identifiers, accession numbers, and namespace > > > > acronyms. Since you're using a custom SeqProcessor anyway already why > > > > don't you just add a line or two of code that parses the display_id > > > > value into the accession and identifier? (for instance, the token > > > > between two '|' characters following the token 'gb') > > > > > > > > -hilmar > > > > > > > > On 1/3/06, Angshu Kar < angshu96@gmail.com> wrote: > > > > > Hi, > > > > > > > > > > Could you please help me resolve the follwoing error? > > > > > > > > > > I run: > > > > > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta > > > > > --driver=Pg --pipeline="SeqProcessor::Accession" > > > yeast_nrpep.fasta > > > > > > > > > > The error: > > > > > > > > > > Loading yeast_nrpep.fasta ... > > > > > > > > > > -------------------- WARNING --------------------- > > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were > > > > > > > > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown > > > > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > > > > ERROR: value too long for type character varying(40) > > > > > --------------------------------------------------- > > > > > Could not store > > > gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > > > > ------------- EXCEPTION ------------- > > > > > MSG: error while executing statement in > > > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: > ERROR: > > > current transaction > > > > > is aborted, commands ignored until end of transaction block > > > > > STACK > > > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > > > > STACK > > > > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::store > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > > > > STACK Bio::DB::Persistent::PersistentObject::store > > > > > > > > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > > > > STACK (eval) ./load_seqdatabase.pl:621 > > > > > STACK toplevel ./load_seqdatabase.pl:604 > > > > > > > > > > -------------------------------------- > > > > > > > > > > at ./load_seqdatabase.pl line 634 > > > > > > > > > > Should I change the field lengths for accession, name and identifier > to > > > some > > > > > value >40 in the bioentry table? What should I change it to? > > > > > > > > > > Thanks, > > > > > Angshu > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@portal.open-bio.org > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > ---------------------------------------------------------- > > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > > > > > > ---------------------------------------------------------- > > > > > > > > > > > > > > > > -- > > > ---------------------------------------------------------- > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > ---------------------------------------------------------- > > > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From angshu96 at gmail.com Tue Jan 3 21:15:11 2006 From: angshu96 at gmail.com (Angshu Kar) Date: Tue Jan 3 21:11:52 2006 Subject: [Bioperl-l] loading yeast data failing... In-Reply-To: References: Message-ID: I'll try that out Hilmar. And thanks for the clue. :) Scent a good mentor in you. :) Thanks again, Angshu PS: And no one forbid me but being a tyro I'm not feeling much confident to fiddle with the real data! On 1/3/06, Hilmar Lapp wrote: > > I suggest you read the SeqIO HOWTO and have a look at the FASTA format > definition (try Google - it's your friend). > > Hint: you're answering your own question. Did someone forbid you to > play around and use the debugger (or simple print statements for that > matter)? > > On 1/3/06, Angshu Kar wrote: > > Thanks Hilmar. > > Now I've another query: > > > > Here is the accessor.pm I'm using (one written by Marc): > > > > use strict; > > use vars qw(@ISA); > > use lib '/home/akar/local/perl/'; > > use Bio::Seq::BaseSeqProcessor; > > use Bio::SeqFeature::Generic; > > > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > > > sub process_seq > > { > > my ($self, $seq) = @_; > > $seq->accession_number($seq->display_id); > > return ($seq); > > } > > > > Could you please let me know what is display_id here? Also which > variable > > contains the "gi|51013395|gb|AAT92991.1|" string? > > > > > > Thanks, > > Angshu > > > > > > On 1/3/06, Hilmar Lapp wrote: > > > On 1/3/06, Angshu Kar wrote: > > > > Hi Hilmar, > > > > > > > > On what basis should I parse? I found the following 3 entries > > (arbitrary) in > > > > the bioentry table. The same 3 entries all went to each of the name, > > > > identifier and accession fields!And the version field contains all > 0s! > > > > > > > > > > > > gi|51013395|gb|AAT92991.1| > > > > gi|732941|emb|CAA54130.1| > > > > gi|6321883|ref|NP_011959.1| > > > > > > > > So, here for record 1: gi|51013395 is the identifier, AAT92991 is > the > > > > accession number, 1 is the version. Am I right? And then what is the > > name? > > > > > > I'd only used 51013395 as the identifier. Other than that: correct. > > > There is no name in the above examples, either because the entry > > > doesn't have one designated, or because the tool that wrote the FASTA > > > file didn't put it into the identifier part. FASTA format doesn't > > > define these things. Have you checked the description whether there is > > > a name somewhere? If there isn't one, I'd default name to accession > > > number. > > > > > > > > > > > Also I found out just the following entry in the 3 same fields in > the > > same > > > > table: > > > > > > > > AT1G08520.1 > > > > > > > > I'm not getting this!I used the TAIR6 dataset.How to parse this > data? > > > > Could you please advise on how to resolve this? > > > > > > I have no idea about the TAIR6 datasets - why don't you ask the people > > > who create those files? > > > > > > -hilmar > > > > > > > > > > > Thanks, > > > > Angshu > > > > > > > > > > > > > > > > On 1/3/06, Hilmar Lapp < hlapp@gmx.net> wrote: > > > > > You could do that but first that puts you out of sync with the > > > > > official schema, and second if you look at the value it isn't > really > > > > > an accession number anyway that's causing the problem but rather a > > > > > concatenation of identifiers, accession numbers, and namespace > > > > > acronyms. Since you're using a custom SeqProcessor anyway already > why > > > > > don't you just add a line or two of code that parses the > display_id > > > > > value into the accession and identifier? (for instance, the token > > > > > between two '|' characters following the token 'gb') > > > > > > > > > > -hilmar > > > > > > > > > > On 1/3/06, Angshu Kar < angshu96@gmail.com> wrote: > > > > > > Hi, > > > > > > > > > > > > Could you please help me resolve the follwoing error? > > > > > > > > > > > > I run: > > > > > > > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres > --format=fasta > > > > > > --driver=Pg --pipeline="SeqProcessor::Accession" > > > > yeast_nrpep.fasta > > > > > > > > > > > > The error: > > > > > > > > > > > > Loading yeast_nrpep.fasta ... > > > > > > > > > > > > -------------------- WARNING --------------------- > > > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, > values > > were > > > > > > > > > > > > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown > > > > > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > > > > > ERROR: value too long for type character varying(40) > > > > > > --------------------------------------------------- > > > > > > Could not store > > > > gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > > > > > ------------- EXCEPTION ------------- > > > > > > MSG: error while executing statement in > > > > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: > > ERROR: > > > > current transaction > > > > > > is aborted, commands ignored until end of transaction block > > > > > > STACK > > > > > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > > > > > STACK > > > > > > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > > > > > STACK > > Bio::DB::BioSQL::BasePersistenceAdaptor::create > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > > > > > STACK > > Bio::DB::BioSQL::BasePersistenceAdaptor::store > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > > > > > STACK Bio::DB::Persistent::PersistentObject::store > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > > > > > STACK (eval) ./load_seqdatabase.pl:621 > > > > > > STACK toplevel ./load_seqdatabase.pl:604 > > > > > > > > > > > > -------------------------------------- > > > > > > > > > > > > at ./load_seqdatabase.pl line 634 > > > > > > > > > > > > Should I change the field lengths for accession, name and > identifier > > to > > > > some > > > > > > value >40 in the bioentry table? What should I change it to? > > > > > > > > > > > > Thanks, > > > > > > Angshu > > > > > > > > > > > > _______________________________________________ > > > > > > Bioperl-l mailing list > > > > > > Bioperl-l@portal.open-bio.org > > > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > ---------------------------------------------------------- > > > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > > > > > > > > > ---------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > -- > > > > > ---------------------------------------------------------- > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > > > ---------------------------------------------------------- > > > > > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > From kiranbina at gmail.com Mon Jan 2 23:40:08 2006 From: kiranbina at gmail.com (Dr. Dhundy R. Bastola) Date: Tue Jan 3 21:22:53 2006 Subject: [Bioperl-l] Please help with bioperl install WIN Message-ID: <43ba0035.00276b0d.6958.ffffceae@mx.gmail.com> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 862 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060102/6f0e43fb/attachment.gif From bmoore at genetics.utah.edu Tue Jan 3 21:33:45 2006 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jan 3 21:31:27 2006 Subject: [Bioperl-l] loading yeast data failing... Message-ID: Angshu- You should read the following documents carefully before asking more questions like this one, this is yet another example that demonstrates that you ask questions before you try to solve the problem yourself. Do you have a copy of Programming Perl sitting next to you on the desk? If not you should, and it should be tattered and worn before you hit the list with basic questions like that. Now try these documents and the suggestions below, repent of you ways and good luck. http://www.catb.org/~esr/faqs/smart-questions.html http://chicago.pm.org/meetings/20031202/perl-debug.txt http://debugger.perl.org/580/perldebug.html Now to get you headed on you way for this problem, specifically, what you want to know about the perl debugger for this issue is: You can run it like this: perl -d your_script.pl You can burrow into your code to the module in question like this: c Path::To::Your::accessor::process_seq Once there you can step through code with n or s. Finally, you can look at varibles (and objects and methods called on objects) that are in scope with x like this: x $my_variable x $seq->accession_number x $seq->display_id Didn't find what you want yet? Look at the whole seq object (and have it paged) like this: | x $seq Or what other methods can I call on this object that might do what I want: m $seq Now try each of the methods that look interesting to see what they do: x $seq->interesting_method Barry > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Angshu Kar > Sent: Tuesday, January 03, 2006 6:56 PM > To: Hilmar Lapp > Cc: bioperl-l > Subject: Re: [Bioperl-l] loading yeast data failing... > > Thanks Hilmar. > Now I've another query: > > Here is the accessor.pm I'm using (one written by > Marc): > > use strict; > use vars qw(@ISA); > use lib '/home/akar/local/perl/'; > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > sub process_seq > { > my ($self, $seq) = @_; > $seq->accession_number($seq->display_id); > return ($seq); > } > > Could you please let me know what is display_id here? Also which variable > contains the "gi|51013395|gb|AAT92991.1|" string? > > Thanks, > Angshu > > > On 1/3/06, Hilmar Lapp wrote: > > > > On 1/3/06, Angshu Kar wrote: > > > Hi Hilmar, > > > > > > On what basis should I parse? I found the following 3 entries > > (arbitrary) in > > > the bioentry table. The same 3 entries all went to each of the name, > > > identifier and accession fields!And the version field contains all 0s! > > > > > > > > > gi|51013395|gb|AAT92991.1| > > > gi|732941|emb|CAA54130.1| > > > gi|6321883|ref|NP_011959.1| > > > > > > So, here for record 1: gi|51013395 is the identifier, AAT92991 is the > > > accession number, 1 is the version. Am I right? And then what is the > > name? > > > > I'd only used 51013395 as the identifier. Other than that: correct. > > There is no name in the above examples, either because the entry > > doesn't have one designated, or because the tool that wrote the FASTA > > file didn't put it into the identifier part. FASTA format doesn't > > define these things. Have you checked the description whether there is > > a name somewhere? If there isn't one, I'd default name to accession > > number. > > > > > > > > Also I found out just the following entry in the 3 same fields in the > > same > > > table: > > > > > > AT1G08520.1 > > > > > > I'm not getting this!I used the TAIR6 dataset.How to parse this data? > > > Could you please advise on how to resolve this? > > > > I have no idea about the TAIR6 datasets - why don't you ask the people > > who create those files? > > > > -hilmar > > > > > > > > Thanks, > > > Angshu > > > > > > > > > > > > On 1/3/06, Hilmar Lapp < hlapp@gmx.net> wrote: > > > > You could do that but first that puts you out of sync with the > > > > official schema, and second if you look at the value it isn't really > > > > an accession number anyway that's causing the problem but rather a > > > > concatenation of identifiers, accession numbers, and namespace > > > > acronyms. Since you're using a custom SeqProcessor anyway already > why > > > > don't you just add a line or two of code that parses the display_id > > > > value into the accession and identifier? (for instance, the token > > > > between two '|' characters following the token 'gb') > > > > > > > > -hilmar > > > > > > > > On 1/3/06, Angshu Kar < angshu96@gmail.com> wrote: > > > > > Hi, > > > > > > > > > > Could you please help me resolve the follwoing error? > > > > > > > > > > I run: > > > > > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres -- > format=fasta > > > > > --driver=Pg --pipeline="SeqProcessor::Accession" > > > yeast_nrpep.fasta > > > > > > > > > > The error: > > > > > > > > > > Loading yeast_nrpep.fasta ... > > > > > > > > > > -------------------- WARNING --------------------- > > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > > were > > > > > > > > > > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD1390 5. > 1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111 ", > "Unknown > > > > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > > > > ERROR: value too long for type character varying(40) > > > > > --------------------------------------------------- > > > > > Could not store > > > gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > > > > ------------- EXCEPTION ------------- > > > > > MSG: error while executing statement in > > > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: > > > current transaction > > > > > is aborted, commands ignored until end of transaction block > > > > > STACK > > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > > > > STACK > > > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > > > > STACK Bio::DB::Persistent::PersistentObject::store > > > > > > > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > > > > STACK (eval) ./load_seqdatabase.pl:621 > > > > > STACK toplevel ./load_seqdatabase.pl:604 > > > > > > > > > > -------------------------------------- > > > > > > > > > > at ./load_seqdatabase.pl line 634 > > > > > > > > > > Should I change the field lengths for accession, name and > identifier > > to > > > some > > > > > value >40 in the bioentry table? What should I change it to? > > > > > > > > > > Thanks, > > > > > Angshu > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@portal.open-bio.org > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > ---------------------------------------------------------- > > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > > > > > ---------------------------------------------------------- > > > > > > > > > > > > > > > > -- > > ---------------------------------------------------------- > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > ---------------------------------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From bmoore at genetics.utah.edu Tue Jan 3 21:37:46 2006 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jan 3 21:34:06 2006 Subject: [Bioperl-l] Please help with bioperl install WIN Message-ID: Kiran, That looks correct to me. The full command is repository, you could try that. What version of ppm are you using? Try ppm> version. If you're using an older version then maybe older versions didn't accept the rep abbreviation. Barry > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Dr. Dhundy R. Bastola > Sent: Monday, January 02, 2006 9:40 PM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] Please help with bioperl install WIN > > > > Hi all, > > I would really appreciate if some one could help . I followed the > instruction for installing bioperl in my laptop. I know the ppm is > installed. I do get the ppm> prompt. However, when I type 'rep add Bioperl > http://bioperl.org/DIST > > I get the message 'Unknown or ambiguous command 'rep'; type 'help' for > commands. Help does not show any 'rep' commands. > > Thanks > > Kiran > > kiranbina@gmail.com From hlapp at gmx.net Tue Jan 3 21:35:54 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Jan 3 21:59:03 2006 Subject: [Bioperl-l] loading yeast data failing... In-Reply-To: References: Message-ID: There is no better thing to learn something than trying it out and relentlessly making mistakes. That's what my son's doing right now: he's pulling himself up and falling around all the time but never gets discouraged, is brazingly overconfident, and for sure some day he'll walk - without me having instructed him a second. You did the same thing some time ago - why do you now worry about confidence? On 1/3/06, Angshu Kar wrote: > I'll try that out Hilmar. And thanks for the clue. :) > Scent a good mentor in you. :) > > Thanks again, > Angshu > > PS: And no one forbid me but being a tyro I'm not feeling much confident to > fiddle with the real data! > > > > On 1/3/06, Hilmar Lapp wrote: > > I suggest you read the SeqIO HOWTO and have a look at the FASTA format > > definition (try Google - it's your friend). > > > > Hint: you're answering your own question. Did someone forbid you to > > play around and use the debugger (or simple print statements for that > > matter)? > > > > On 1/3/06, Angshu Kar wrote: > > > Thanks Hilmar. > > > Now I've another query: > > > > > > Here is the accessor.pm I'm using (one written by Marc): > > > > > > use strict; > > > use vars qw(@ISA); > > > use lib '/home/akar/local/perl/'; > > > use Bio::Seq::BaseSeqProcessor; > > > use Bio::SeqFeature::Generic; > > > > > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > > > > > sub process_seq > > > { > > > my ($self, $seq) = @_; > > > $seq->accession_number($seq->display_id); > > > return ($seq); > > > } > > > > > > Could you please let me know what is display_id here? Also which > variable > > > contains the "gi|51013395|gb|AAT92991.1|" string? > > > > > > > > > Thanks, > > > Angshu > > > > > > > > > On 1/3/06, Hilmar Lapp wrote: > > > > On 1/3/06, Angshu Kar wrote: > > > > > Hi Hilmar, > > > > > > > > > > On what basis should I parse? I found the following 3 entries > > > (arbitrary) in > > > > > the bioentry table. The same 3 entries all went to each of the name, > > > > > identifier and accession fields!And the version field contains all > 0s! > > > > > > > > > > > > > > > gi|51013395|gb|AAT92991.1| > > > > > gi|732941|emb|CAA54130.1| > > > > > gi|6321883|ref|NP_011959.1| > > > > > > > > > > So, here for record 1: gi|51013395 is the identifier, AAT92991 is > the > > > > > accession number, 1 is the version. Am I right? And then what is the > > > name? > > > > > > > > I'd only used 51013395 as the identifier. Other than that: correct. > > > > There is no name in the above examples, either because the entry > > > > doesn't have one designated, or because the tool that wrote the FASTA > > > > file didn't put it into the identifier part. FASTA format doesn't > > > > define these things. Have you checked the description whether there is > > > > a name somewhere? If there isn't one, I'd default name to accession > > > > number. > > > > > > > > > > > > > > Also I found out just the following entry in the 3 same fields in > the > > > same > > > > > table: > > > > > > > > > > AT1G08520.1 > > > > > > > > > > I'm not getting this!I used the TAIR6 dataset.How to parse this > data? > > > > > Could you please advise on how to resolve this? > > > > > > > > I have no idea about the TAIR6 datasets - why don't you ask the people > > > > who create those files? > > > > > > > > -hilmar > > > > > > > > > > > > > > Thanks, > > > > > Angshu > > > > > > > > > > > > > > > > > > > > On 1/3/06, Hilmar Lapp < hlapp@gmx.net> wrote: > > > > > > You could do that but first that puts you out of sync with the > > > > > > official schema, and second if you look at the value it isn't > really > > > > > > an accession number anyway that's causing the problem but rather a > > > > > > concatenation of identifiers, accession numbers, and namespace > > > > > > acronyms. Since you're using a custom SeqProcessor anyway already > why > > > > > > don't you just add a line or two of code that parses the > display_id > > > > > > value into the accession and identifier? (for instance, the token > > > > > > between two '|' characters following the token 'gb') > > > > > > > > > > > > -hilmar > > > > > > > > > > > > On 1/3/06, Angshu Kar < angshu96@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > > > > > > > Could you please help me resolve the follwoing error? > > > > > > > > > > > > > > I run: > > > > > > > > > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres > --format=fasta > > > > > > > --driver=Pg > --pipeline="SeqProcessor::Accession" > > > > > yeast_nrpep.fasta > > > > > > > > > > > > > > The error: > > > > > > > > > > > > > > Loading yeast_nrpep.fasta ... > > > > > > > > > > > > > > -------------------- WARNING --------------------- > > > > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, > values > > > were > > > > > > > > > > > > > > > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown > > > > > > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > > > > > > ERROR: value too long for type character varying(40) > > > > > > > > --------------------------------------------------- > > > > > > > Could not store > > > > > gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > > > > > > ------------- EXCEPTION ------------- > > > > > > > MSG: error while executing statement in > > > > > > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: > > > ERROR: > > > > > current transaction > > > > > > > is aborted, commands ignored until end of transaction block > > > > > > > STACK > > > > > > > > > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > > > > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > > > > > > STACK > > > > > > > > > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > > > > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > > > > > > STACK > > > Bio::DB::BioSQL::BasePersistenceAdaptor::create > > > > > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > > > > > > STACK > > > Bio::DB::BioSQL::BasePersistenceAdaptor::store > > > > > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > > > > > > STACK > Bio::DB::Persistent::PersistentObject::store > > > > > > > > > > > > > > > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > > > > > > STACK (eval) ./load_seqdatabase.pl:621 > > > > > > > STACK toplevel ./load_seqdatabase.pl:604 > > > > > > > > > > > > > > -------------------------------------- > > > > > > > > > > > > > > at ./load_seqdatabase.pl line 634 > > > > > > > > > > > > > > Should I change the field lengths for accession, name and > identifier > > > to > > > > > some > > > > > > > value >40 in the bioentry table? What should I change it to? > > > > > > > > > > > > > > Thanks, > > > > > > > Angshu > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l@portal.open-bio.org > > > > > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > ---------------------------------------------------------- > > > > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > > > > > > > > > > > > > ---------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > ---------------------------------------------------------- > > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > > > > > > ---------------------------------------------------------- > > > > > > > > > > > > > > > > -- > > > ---------------------------------------------------------- > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > ---------------------------------------------------------- > > > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From hlapp at gmx.net Tue Jan 3 16:36:24 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue Jan 3 22:08:05 2006 Subject: Fwd: [Bioperl-l] loading yeast data failing... In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: Angshu Kar Date: Jan 3, 2006 1:31 PM Subject: Re: [Bioperl-l] loading yeast data failing... To: Hilmar Lapp Hi Hilmar, If you have some time, could you please write those 2 lines for me? I just have no clue about it. After that I think I've to rebuild the package. Am I right? Here is the accessor.pm: use strict; use vars qw(@ISA); use lib '/home/akar/local/perl/'; use Bio::Seq::BaseSeqProcessor; use Bio::SeqFeature::Generic; @ISA = qw(Bio::Seq::BaseSeqProcessor); sub process_seq { my ($self, $seq) = @_; $seq->accession_number($seq->display_id); return ($seq); } Thanks, Angshu On 1/3/06, Hilmar Lapp wrote: > You could do that but first that puts you out of sync with the > official schema, and second if you look at the value it isn't really > an accession number anyway that's causing the problem but rather a > concatenation of identifiers, accession numbers, and namespace > acronyms. Since you're using a custom SeqProcessor anyway already why > don't you just add a line or two of code that parses the display_id > value into the accession and identifier? (for instance, the token > between two '|' characters following the token 'gb') > > -hilmar > > On 1/3/06, Angshu Kar wrote: > > Hi, > > > > Could you please help me resolve the follwoing error? > > > > I run: > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta > > --driver=Pg --pipeline="SeqProcessor::Accession" yeast_nrpep.fasta > > > > The error: > > > > Loading yeast_nrpep.fasta ... > > > > -------------------- WARNING --------------------- > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values were > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > ERROR: value too long for type character varying(40) > > --------------------------------------------------- > > Could not store gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > ------------- EXCEPTION ------------- > > MSG: error while executing statement in > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current transaction > > is aborted, commands ignored until end of transaction block > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > STACK Bio::DB::Persistent::PersistentObject::store > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > STACK (eval) ./load_seqdatabase.pl:621 > > STACK toplevel ./load_seqdatabase.pl:604 > > > > -------------------------------------- > > > > at ./load_seqdatabase.pl line 634 > > > > Should I change the field lengths for accession, name and identifier to some > > value >40 in the bioentry table? What should I change it to? > > > > Thanks, > > Angshu > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From bmoore at genetics.utah.edu Tue Jan 3 22:46:37 2006 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jan 3 22:42:46 2006 Subject: [Bioperl-l] loading yeast data failing... Message-ID: Angshu- Have you not been listening to anything that has been written to you on this list. The Bioperl community has been amazingly patient with your questions over the last several months, and many have told you time and time again that you should think before you post. Less than an hour ago Hilmar and I both suggested ways that you could try to solve your own problem, and now you are back asking Hilmar to write code for you? Unbelievable. Asking Hilmar to write your code for you is VERY UNACCEPTABLE!!! If you need to have someone write your code for you or tutor you in the basics of Perl and Linux (and I think you do) then you need to hire them. I can recommend some skilled contract programmers if you need to hire one. This list welcomes beginners, but you are expected to put forth some effort at trying to solve problems yourself first. Your brazen disregard for the etiquette of open source mailing lists suggests to me that you should not trouble the Bioperl community with further questions until you have at least a basic command of Perl and Linux and a willingness to try things yourself first. Get a copy of Programming Perl and read it! Get a copy of Object Oriented Perl and read it! Get a copy of any Linux manual and read it! And most important of all, just write code and try it - you won't break the computer! Barry > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Tuesday, January 03, 2006 2:36 PM > To: bioperl-l > Subject: Fwd: [Bioperl-l] loading yeast data failing... > > ---------- Forwarded message ---------- > From: Angshu Kar > Date: Jan 3, 2006 1:31 PM > Subject: Re: [Bioperl-l] loading yeast data failing... > To: Hilmar Lapp > > > Hi Hilmar, > > If you have some time, could you please write those 2 lines for me? I > just have no clue about it. After that I think I've to rebuild the > package. Am I right? > > Here is the accessor.pm: > > use strict; > use vars qw(@ISA); > use lib '/home/akar/local/perl/'; > use Bio::Seq::BaseSeqProcessor; > use Bio::SeqFeature::Generic; > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > sub process_seq > { > my ($self, $seq) = @_; > $seq->accession_number($seq->display_id); > return ($seq); > } > > > Thanks, > Angshu > > > On 1/3/06, Hilmar Lapp wrote: > > You could do that but first that puts you out of sync with the > > official schema, and second if you look at the value it isn't really > > an accession number anyway that's causing the problem but rather a > > concatenation of identifiers, accession numbers, and namespace > > acronyms. Since you're using a custom SeqProcessor anyway already why > > don't you just add a line or two of code that parses the display_id > > value into the accession and identifier? (for instance, the token > > between two '|' characters following the token 'gb') > > > > -hilmar > > > > On 1/3/06, Angshu Kar wrote: > > > Hi, > > > > > > Could you please help me resolve the follwoing error? > > > > > > I run: > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta > > > --driver=Pg --pipeline="SeqProcessor::Accession" yeast_nrpep.fasta > > > > > > The error: > > > > > > Loading yeast_nrpep.fasta ... > > > > > > -------------------- WARNING --------------------- > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > were > > > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD1390 5. > 1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111 ", > "Unknown > > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > > ERROR: value too long for type character varying(40) > > > --------------------------------------------------- > > > Could not store gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > > ------------- EXCEPTION ------------- > > > MSG: error while executing statement in > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > transaction > > > is aborted, commands ignored until end of transaction block > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > > STACK Bio::DB::Persistent::PersistentObject::store > > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > > STACK (eval) ./load_seqdatabase.pl:621 > > > STACK toplevel ./load_seqdatabase.pl:604 > > > > > > -------------------------------------- > > > > > > at ./load_seqdatabase.pl line 634 > > > > > > Should I change the field lengths for accession, name and identifier > to some > > > value >40 in the bioentry table? What should I change it to? > > > > > > Thanks, > > > Angshu > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > -- > > ---------------------------------------------------------- > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > ---------------------------------------------------------- > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From angshu96 at gmail.com Tue Jan 3 23:11:28 2006 From: angshu96 at gmail.com (Angshu Kar) Date: Tue Jan 3 23:36:37 2006 Subject: [Bioperl-l] loading yeast data failing... In-Reply-To: References: Message-ID: I apologize again Barry. But as Hilmar suggested it was about 2 lines of code, I thought he had that piece which he could provide me with. I'm sorry Hilmar. And I miised cc to bioperl in that mail and I wrote that mail much earlier. But I apologize for that too. May be my situation (Esp. time crunch) is driving me to behave this ill-mannerdly. I'm sorry again. I am going to post no more to this community till I'm well versed in perl,linux. And I'm truly obliged by the support you people provided me with. Thanks, Angshu On 1/3/06, Barry Moore wrote: > > Angshu- > > Have you not been listening to anything that has been written to you on > this list. The Bioperl community has been amazingly patient with your > questions over the last several months, and many have told you time and > time again that you should think before you post. Less than an hour ago > Hilmar and I both suggested ways that you could try to solve your own > problem, and now you are back asking Hilmar to write code for you? > Unbelievable. Asking Hilmar to write your code for you is VERY > UNACCEPTABLE!!! If you need to have someone write your code for you or > tutor you in the basics of Perl and Linux (and I think you do) then you > need to hire them. I can recommend some skilled contract programmers if > you need to hire one. This list welcomes beginners, but you are > expected to put forth some effort at trying to solve problems yourself > first. Your brazen disregard for the etiquette of open source mailing > lists suggests to me that you should not trouble the Bioperl community > with further questions until you have at least a basic command of Perl > and Linux and a willingness to try things yourself first. Get a copy of > Programming Perl and read it! Get a copy of Object Oriented Perl and > read it! Get a copy of any Linux manual and read it! And most > important of all, just write code and try it - you won't break the > computer! > > Barry > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > > Sent: Tuesday, January 03, 2006 2:36 PM > > To: bioperl-l > > Subject: Fwd: [Bioperl-l] loading yeast data failing... > > > > ---------- Forwarded message ---------- > > From: Angshu Kar > > Date: Jan 3, 2006 1:31 PM > > Subject: Re: [Bioperl-l] loading yeast data failing... > > To: Hilmar Lapp > > > > > > Hi Hilmar, > > > > If you have some time, could you please write those 2 lines for me? I > > just have no clue about it. After that I think I've to rebuild the > > package. Am I right? > > > > Here is the accessor.pm: > > > > use strict; > > use vars qw(@ISA); > > use lib '/home/akar/local/perl/'; > > use Bio::Seq::BaseSeqProcessor; > > use Bio::SeqFeature::Generic; > > > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > > > sub process_seq > > { > > my ($self, $seq) = @_; > > $seq->accession_number($seq->display_id); > > return ($seq); > > } > > > > > > Thanks, > > Angshu > > > > > > On 1/3/06, Hilmar Lapp wrote: > > > You could do that but first that puts you out of sync with the > > > official schema, and second if you look at the value it isn't really > > > an accession number anyway that's causing the problem but rather a > > > concatenation of identifiers, accession numbers, and namespace > > > acronyms. Since you're using a custom SeqProcessor anyway already > why > > > don't you just add a line or two of code that parses the display_id > > > value into the accession and identifier? (for instance, the token > > > between two '|' characters following the token 'gb') > > > > > > -hilmar > > > > > > On 1/3/06, Angshu Kar wrote: > > > > Hi, > > > > > > > > Could you please help me resolve the follwoing error? > > > > > > > > I run: > > > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres > --format=fasta > > > > --driver=Pg --pipeline="SeqProcessor::Accession" yeast_nrpep.fasta > > > > > > > > The error: > > > > > > > > Loading yeast_nrpep.fasta ... > > > > > > > > -------------------- WARNING --------------------- > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values > > were > > > > > > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD1390 > 5. > > > 1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111 > ", > > "Unknown > > > > [Saccharomyces cerevisiae]","0","") FKs (19,) > > > > ERROR: value too long for type character varying(40) > > > > --------------------------------------------------- > > > > Could not store gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > > > ------------- EXCEPTION ------------- > > > > MSG: error while executing statement in > > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR: current > > transaction > > > > is aborted, commands ignored until end of transaction block > > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > > > STACK Bio::DB::Persistent::PersistentObject::store > > > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > > > STACK (eval) ./load_seqdatabase.pl:621 > > > > STACK toplevel ./load_seqdatabase.pl:604 > > > > > > > > -------------------------------------- > > > > > > > > at ./load_seqdatabase.pl line 634 > > > > > > > > Should I change the field lengths for accession, name and > identifier > > to some > > > > value >40 in the bioentry table? What should I change it to? > > > > > > > > Thanks, > > > > Angshu > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > -- > > > ---------------------------------------------------------- > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > ---------------------------------------------------------- > > > > > > > > > > > -- > > ---------------------------------------------------------- > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > ---------------------------------------------------------- > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Tue Jan 3 22:45:10 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jan 3 23:50:28 2006 Subject: [Bioperl-l] blast output -> blast -m8 output In-Reply-To: <43B48DB8.3060201@infotech.monash.edu.au> References: <339D68B133EAD311971E009027DC479703DB3B5A@montecarlo.cgr.harvard.edu> <43B48DB8.3060201@infotech.monash.edu.au> Message-ID: The existing search2table script in scripts/searchio does this for you - I don't think there is a writer plugin but there could be. Note that if you just using BLAST you will find that the blast2table script that is included in the BLAST book (see the O'Reilly website for the book and download the code examples) will also generate this sort of thing for you and will be many times faster than SearchIO code. There is also an equivalent hmmer_to_table and fastam9_to_table which are very fast re-formatters that don't actually use SearchIO since one is just trying to get the very simple data out. $ more scripts/searchio/search2table.PLS =head1 NAME search2table - turn SearchIO parseable reports into tab delimited format like NCBI's -m 9 =head1 SYNOPSIS search2table -f fasta -i file.FASTA -o output.table On Dec 29, 2005, at 8:30 PM, Torsten Seemann wrote: > Amir Karger wrote: >> I'm writing a script that will take regular blast output and >> translate it to >> blast -m8 tabular form. (The reverse transform won't work without >> re-doing >> the alignments.) >> I've attached the blast output for running 3 sequences against >> month.aa. >> Below are the script, the script output, and the blast -m8 output. >> (Output >> is the same for bioperl-1.4 and 1.5-RC1.) > > I can't verify that your code is correct, but I have two comments > anyway which other BioPerl developers may be able to help us with: > > 1. Can this be done already using any of the Writer modules? > > http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/ > SearchIO/Writer/toc.html > > 2. If not should probably turn the code into a blasttableWriter.pm > class? > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From jason.stajich at duke.edu Tue Jan 3 22:48:46 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jan 3 23:50:37 2006 Subject: [Bioperl-l] Re: Test writing In-Reply-To: <43B284A1.1000708@lsi.upc.edu> References: <200512211211.jBLCBE8U012524@portal.open-bio.org> <43B284A1.1000708@lsi.upc.edu> Message-ID: <4410EED7-DFB5-46E7-B320-795C8066ADCC@duke.edu> On Dec 28, 2005, at 7:27 AM, Gabriel Valiente wrote: > Sorry about the delay, I've promised to write test files for > Bio::Tree::Draw::Cladogram and Bio::Tree::Compatible much earlier. > > I've just submitted t/Compatible.t, a test file for > Bio::Tree::Compatible. There are five simple tests, two of which > are commented out because they need an equality test for > Bio::Tree::Tree objects, which is missing. Any volunteer for > writing a tree equality test? Jason? > As soon as proper equality function is in the toolkit I can fix the bootstrap computing code in Bio::Tree::Statistics too. Not sure when I'll have time and have to find a description of a good algorithm for doing it. > Regarding the test file for Bio::Tree::Draw::Cladogram, I just > don't know what to include in the test. The only thing this module > does is to produce a EPS file, and the only thing that comes to my > mind is to test equality of a precomputed EPS file and one produced > on the fly for the same input tree. Any suggestions are welcome. > > By the way, can anybody briefly explain why is it necessary to > include prefixes like > > my $common = $t1->Bio::Tree::Compatible::common_labels($t2); > > in t/Compatible.t, which already uses Bio::Tree::Compatible, > instead of just > > my $common = $t1->common_labels($t2); > > ??? > > Thanks, > > Gabriel Valiente > >> This message is for module authors and for anyone looking for >> something to do for BioPerl. >> >> I've modified the 'maintenance/modules.pl --untested' code so that >> it reads in all test files, extracts 'use'd or 'required' Bio >> classes, now *recursively* marks all super classes and 'use'd >> classes as tested. In addition, I manually ignore all Bio::Search >> and Bio::SearchIO classes. That is because although there are >> extensive tests for modules in these name spaces, new classes are >> instantiated based on attributes to the constructor. This might be >> true to other classes, too. If this is the case, I can see two >> possible actions: >> >> 1. Add them to the list of ignored class names at the end of >> function 'untested' in the script. >> >> 2. If the classes are never called directly, rename them to start >> with lower case letter which by convention means that they are >> "component classes" like Bio::SeqIO::genbank. >> >> >> All other classes should have tests written for them or - >> ultimately - removed from the repository. >> >> The aim of this exercise is to come up with a first pass list of >> BioPerl classes that do not have any tests written for them and >> get them written. The next pass will be to find untested methods >> within classes. >> >> >> When writing tests, please follow conventions in existing files >> and remember to test all public methods. If you do not have a cvs >> login, post the new tests to bugzilla.bioperl.org, not to the >> list. We are more than happy to give cvs access to anyone >> committing more than a couple new tests. >> >> -Heikki >> >> >> List of BioPerl classes needing tests: >> >> Bio::Align::Utilities >> Bio::Annotation::AnnotationFactory >> Bio::Annotation::Target >> Bio::DB::Ace >> Bio::DB::Expression >> Bio::DB::Fasta::Stream >> Bio::DB::Flat::BDB >> Bio::DB::Flat::BinarySearch >> Bio::DB::GFF::ID_Iterator >> Bio::DB::Universal >> Bio::DB::XEMBLService >> Bio::Expression::Contact >> Bio::Expression::DataSet >> Bio::Expression::FeatureGroup >> Bio::Expression::FeatureGroup::FeatureGroupMas50 >> Bio::Expression::FeatureSet::FeatureSetMas50 >> Bio::Expression::Platform >> Bio::Expression::Sample >> Bio::FeatureIO >> Bio::Graphics::FeatureFile::Iterator >> Bio::Graphics::Glyph >> Bio::Graphics::Util >> Bio::Index::Fastq >> Bio::Index::Hmmer >> Bio::LiveSeq::IO::SRS >> Bio::Location::AvWithinCoordPolicy >> Bio::Location::NarrowestCoordPolicy >> Bio::Map::Clone >> Bio::Map::Contig >> Bio::Map::FPCMarker >> Bio::Map::OrderedPositionWithDistance >> Bio::Map::Physical >> Bio::Matrix::PSM::Psm >> Bio::Matrix::PSM::PsmHeader >> Bio::Matrix::Scoring >> Bio::Ontology::InterProTerm >> Bio::Ontology::Path >> Bio::Ontology::SimpleGOEngine >> Bio::OntologyIO::Handlers::InterProHandler >> Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler >> Bio::OntologyIO::InterProParser >> Bio::PrimarySeq::Fasta >> Bio::Root::Err >> Bio::Root::Global >> Bio::Root::IOManager >> Bio::Root::Utilities >> Bio::Root::Vector >> Bio::Root::Xref >> Bio::SeqFeature::Gene::Promoter >> Bio::SeqFeature::PositionProxy >> Bio::SeqFeature::Tools::FeatureNamer >> Bio::SeqFeature::Tools::IDHandler >> Bio::SeqFeature::Tools::TypeMapper >> Bio::SeqIO::FTHelper >> Bio::Structure::SecStr::DSSP::Res >> Bio::Structure::SecStr::STRIDE::Res >> Bio::Taxonomy >> Bio::Taxonomy::Taxon >> Bio::Taxonomy::Tree >> Bio::Tools::AlignFactory >> Bio::Tools::Blast::HSP >> Bio::Tools::Blast::HTML >> Bio::Tools::Blast::Sbjct >> Bio::Tools::Blat >> Bio::Tools::Coil >> Bio::Tools::ESTScan >> Bio::Tools::Eponine >> Bio::Tools::Fgenesh >> Bio::Tools::Gel >> Bio::Tools::Grail >> Bio::Tools::HMM >> Bio::Tools::Hmmpfam >> Bio::Tools::Primer::Feature >> Bio::Tools::Primer::Pair >> Bio::Tools::Prints >> Bio::Tools::Profile >> Bio::Tools::PrositeScan >> Bio::Tools::Run::GenericParameters >> Bio::Tools::Seg >> Bio::Tools::Signalp >> Bio::Tools::Tmhmm >> Bio::Tools::WWW >> Bio::Tools::WebBlat >> Bio::Tree::Compatible >> Bio::Tree::Draw::Cladogram >> Bio::Tree::NodeNHX >> FeatureStore # in Bio::DB::GFF::Adaptor::berkeleydb.pm >> interpro # class defined in Bio::SeqIO::interpro, mistake? >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From chen_li3 at yahoo.com Wed Jan 4 00:08:13 2006 From: chen_li3 at yahoo.com (chen li) Date: Wed Jan 4 00:11:37 2006 Subject: [Bioperl-l] loading yeast data failing... In-Reply-To: Message-ID: <20060104050813.24144.qmail@web36813.mail.mud.yahoo.com> Hi Angshu, As I told you before Dr.Barry Moore,Dr. Hilmar Lapp, and Dr. Jason Stajich (and others) are professors in the Dept of Computer Science (or something like that). They are pretty busy and spend their free time to host this mailing list without any reimbursements. You ask them to write the code for you just like asking your professor to do the job for you. This is the reason why you get your ass kicked. You might get around your problem by doing some research on google, or search the mailing list to see if the same question is asked before, or search similar mailing lists. But anyway you should do some reading and write you code first and run it. Just try and it is very painful for a beginer but it is worth. Good luck, Li --- Barry Moore wrote: > Angshu- > > Have you not been listening to anything that has > been written to you on > this list. The Bioperl community has been amazingly > patient with your > questions over the last several months, and many > have told you time and > time again that you should think before you post. > Less than an hour ago > Hilmar and I both suggested ways that you could try > to solve your own > problem, and now you are back asking Hilmar to write > code for you? > Unbelievable. Asking Hilmar to write your code for > you is VERY > UNACCEPTABLE!!! If you need to have someone write > your code for you or > tutor you in the basics of Perl and Linux (and I > think you do) then you > need to hire them. I can recommend some skilled > contract programmers if > you need to hire one. This list welcomes beginners, > but you are > expected to put forth some effort at trying to solve > problems yourself > first. Your brazen disregard for the etiquette of > open source mailing > lists suggests to me that you should not trouble the > Bioperl community > with further questions until you have at least a > basic command of Perl > and Linux and a willingness to try things yourself > first. Get a copy of > Programming Perl and read it! Get a copy of Object > Oriented Perl and > read it! Get a copy of any Linux manual and read > it! And most > important of all, just write code and try it - you > won't break the > computer! > > Barry > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l- > > bounces@portal.open-bio.org] On Behalf Of Hilmar > Lapp > > Sent: Tuesday, January 03, 2006 2:36 PM > > To: bioperl-l > > Subject: Fwd: [Bioperl-l] loading yeast data > failing... > > > > ---------- Forwarded message ---------- > > From: Angshu Kar > > Date: Jan 3, 2006 1:31 PM > > Subject: Re: [Bioperl-l] loading yeast data > failing... > > To: Hilmar Lapp > > > > > > Hi Hilmar, > > > > If you have some time, could you please write > those 2 lines for me? I > > just have no clue about it. After that I think > I've to rebuild the > > package. Am I right? > > > > Here is the accessor.pm: > > > > use strict; > > use vars qw(@ISA); > > use lib '/home/akar/local/perl/'; > > use Bio::Seq::BaseSeqProcessor; > > use Bio::SeqFeature::Generic; > > > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > > > sub process_seq > > { > > my ($self, $seq) = @_; > > $seq->accession_number($seq->display_id); > > return ($seq); > > } > > > > > > Thanks, > > Angshu > > > > > > On 1/3/06, Hilmar Lapp wrote: > > > You could do that but first that puts you out of > sync with the > > > official schema, and second if you look at the > value it isn't really > > > an accession number anyway that's causing the > problem but rather a > > > concatenation of identifiers, accession numbers, > and namespace > > > acronyms. Since you're using a custom > SeqProcessor anyway already > why > > > don't you just add a line or two of code that > parses the display_id > > > value into the accession and identifier? (for > instance, the token > > > between two '|' characters following the token > 'gb') > > > > > > -hilmar > > > > > > On 1/3/06, Angshu Kar > wrote: > > > > Hi, > > > > > > > > Could you please help me resolve the follwoing > error? > > > > > > > > I run: > > > > > > > > ./load_seqdatabase.pl --dbname=USBA > --dbuser=postgres > --format=fasta > > > > --driver=Pg > --pipeline="SeqProcessor::Accession" > yeast_nrpep.fasta > > > > > > > > The error: > > > > > > > > Loading yeast_nrpep.fasta ... > > > > > > > > -------------------- WARNING > --------------------- > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor > (driver) failed, values > > were > > > > > > > ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD1390 > 5. > > > 1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111 > ", > > "Unknown > > > > [Saccharomyces cerevisiae]","0","") FKs > (19,) > > > > ERROR: value too long for type character > varying(40) > > > > > --------------------------------------------------- > > > > Could not store > gi|4261605|gb|AAD13905.1|S58126_11111111111111: > > > > ------------- EXCEPTION ------------- > > > > MSG: error while executing statement in > > > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: > ERROR: current > > transaction > > > > is aborted, commands ignored until end of > transaction block > > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951 > > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::create > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205 > > > > STACK > Bio::DB::BioSQL::BasePersistenceAdaptor::store > > > > > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 > > > > STACK > Bio::DB::Persistent::PersistentObject::store > > > > > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272 > > > > STACK (eval) ./load_seqdatabase.pl:621 > > > > STACK toplevel ./load_seqdatabase.pl:604 > > > > > > > > -------------------------------------- > > > > > > > > at ./load_seqdatabase.pl line 634 > > > > > > > > Should I change the field lengths for > accession, name and > identifier > > to some > > > > value >40 in the bioentry table? What should > I change it to? > > > > > > > > Thanks, > > > > Angshu > > > > > > > > > _______________________________________________ > === message truncated === __________________________________________ Yahoo! DSL ? Something to write home about. Just $16.99/mo. or less. dsl.yahoo.com From Marc.Logghe at DEVGEN.com Wed Jan 4 03:03:12 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed Jan 4 03:11:54 2006 Subject: [Bioperl-l] loading yeast data failing... Message-ID: <0C528E3670D8CE4B8E013F6749231AA67469EA@ANTARESIA.be.devgen.com> Hi Angshu, Allow me to make a remark before you run into more (coding and other) troubles. > > > I just have no clue about it. After that I think I've to > rebuild the > > > package. Am I right? > > > > > > Here is the accessor.pm: > > > > > > use strict; > > > use vars qw(@ISA); > > > use lib '/home/akar/local/perl/'; > > > use Bio::Seq::BaseSeqProcessor; > > > use Bio::SeqFeature::Generic; > > > > > > @ISA = qw(Bio::Seq::BaseSeqProcessor); > > > > > > sub process_seq > > > { > > > my ($self, $seq) = @_; > > > $seq->accession_number($seq->display_id); > > > return ($seq); > > > } > > > > > > > > I run: > > > > > > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres > > --format=fasta > > > > > --driver=Pg --pipeline="SeqProcessor::Accession" > > > > > yeast_nrpep.fasta You add SeqProcessor::Accession as the pipeline argument. This means that your package should be named 'Accession.pm' and not 'accessor.pm'. Also, the package should reside in the directory '/home/akar/local/perl/SeqProcessor'. HTH and good luck, Marc From khoueiry at ibdm.univ-mrs.fr Wed Jan 4 04:42:16 2006 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Wed Jan 4 05:07:08 2006 Subject: [Bioperl-l] counting and searching patterns Message-ID: <1136367736.12344.11.camel@DavidLinux> Hello, It's been a while that I'm working on an issue and the matter of finding the best way to do it is triggering me. Actually, I want to count/search a pattern in a nucleotide sequence. (i.e : search/count for MGGAAR). What I'm doing now is to Generates unique Seq objects using :IUPAC:module, then for each unique seq generates the reverse one using :SeqPattern: and going to count/search in my seq. i.e : MGGAAR -> C/AGGAAG/A (IUPAC) 4 possibilities + 4 reverse ( :SeqPattern:) = 8 . I was wondering, if there is a bioperl way to do the count/search directly using the initial pattern (MGGAAR) taking the reverse case into account (that is YTTCCK in my example). Any help or suggestions are appreciated Thanks to all and happy new year pierre From daniel.lang at biologie.uni-freiburg.de Wed Jan 4 05:00:54 2006 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed Jan 4 05:23:39 2006 Subject: [Bioperl-l] Bio::DB::Registry get_all_primary_ids Message-ID: <43BB9CD6.6000907@biologie.uni-freiburg.de> Hi, I stumbled over a problem in the Registry modules: I wanted to use the method get_all_primary_ids as laid out in Bio::DB::SeqI. The Bio::DB::Failover object as returned by the get_database method doesn't support this method while the (in my case) attached Bio::DB::Flat::BinarySearch object does! How can I comfortably call this method on the object? For now I'm using something "ugly" like: $db->{_database}->[0]->get_all_primary_ids Here is my object dump (Its describing a local sprot.dat): Bio::DB::Failover=HASH(0x87ca9a0) '_database' => ARRAY(0x8b721a4) 0 Bio::DB::Flat::BinarySearch=HASH(0x8bbacdc) '_dbfile' => HASH(0x8d02f54) '/home/lang/projects/core_ortho/uniprot_sprot.dat' => 0 '_file' => HASH(0x8bbaf70) 0 => '/home/lang/projects/core_ortho/uniprot_sprot.dat' '_fileid' => HASH(0x8d13ab4) 0 => GLOB(0x8d42adc) -> *Bio::DB::Flat::BinarySearch::$fh FileHandle({*Bio::DB::Flat::BinarySearch::$fh}) => fileno(9) '_index_directory' => '/home/lang/projects/core_ortho/' '_index_type' => 'flat' '_index_version' => 1 '_primary_index_handle' => GLOB(0x8d13db4) -> *Bio::DB::Flat::BinarySearch::$__ANONIO__ FileHandle({*Bio::DB::Flat::BinarySearch::$__ANONIO__}) => fileno(8) '_primary_namespace' => undef '_record_size' => 29 '_root_verbose' => 0 '_size' => HASH(0x8d42e48) 0 => 796304807 '_start_pos' => 4 'flat_dbname' => 'test' 'format' => 'swiss' 'primary_pattern' => undef 'secondary_namespaces' => ARRAY(0x8b8f774) 0 'ACC' 'start_pattern' => undef As far as I understand it, the corresponding method needs to be implemented in Failover? In addition I'm somewhat confused by the term "Failover" - but since the retrieval is working like its supposed to... Thanks in advance, Daniel -- Daniel Lang University of Freiburg, Plant Biotechnology Schaenzlestr. 1, D-79104 Freiburg fax: +49 761 203 6945 phone: +49 761 203 6974 homepage: http://www.plant-biotech.net/ e-mail: daniel.lang@biologie.uni-freiburg.de ################################################# My software never has bugs. It just develops random features. ################################################# From Marc.Logghe at DEVGEN.com Wed Jan 4 07:21:21 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed Jan 4 07:18:32 2006 Subject: [Bioperl-l] counting and searching patterns Message-ID: <0C528E3670D8CE4B8E013F6749231AA67469EE@ANTARESIA.be.devgen.com> Hi Pierre, Never used it myself, but can you do something with Bio::Tools::SeqPattern ? Have a look at the FAQ: http://bioperl.open-bio.org/wiki/FAQ#How_do_I_do_motif_searches_with_Bio Perl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_ given_motif.3F You could also do it with EMBOSS' fuzznuc: On the command line you do: fuzznuc -pattern MGGAAR -complement If you need to automate this, you can launch fuzznuc with bioperl. To do that you will need Bio::Factory::EMBOSS which is part of bioperl-run. HTH Cheers, Marc > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of khoueiry > Sent: Wednesday, January 04, 2006 10:42 AM > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] counting and searching patterns > > > Hello, > > It's been a while that I'm working on an issue and the > matter of finding the best way to do it is triggering me. > Actually, I want to count/search a pattern in a nucleotide > sequence. (i.e : search/count for MGGAAR). What I'm doing now > is to Generates unique Seq objects using :IUPAC:module, then > for each unique seq generates the reverse one using > :SeqPattern: and going to count/search in my seq. > > > i.e : MGGAAR -> C/AGGAAG/A (IUPAC) 4 possibilities + 4 > reverse ( :SeqPattern:) = 8 . > > > I was wondering, if there is a bioperl way to do the > count/search directly using the initial pattern (MGGAAR) > taking the reverse case into account (that is YTTCCK in my example). > > > Any help or suggestions are appreciated > > > Thanks to all and happy new year > > pierre > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From Marc.Logghe at DEVGEN.com Wed Jan 4 08:01:36 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed Jan 4 08:10:21 2006 Subject: [Bioperl-l] Bio::DB::Registry get_all_primary_ids Message-ID: <0C528E3670D8CE4B8E013F6749231AA67469EF@ANTARESIA.be.devgen.com> Hi Daniel, As far as I can see, you are using actually only 1 database (no need to use Bio::DB::Failover), of type Bio::DB::Flat::BinarySearch. Using a Bio::DB::Failover object you can attach multiple databases (e.g. Bio::DB::SeqI compliant objects). In case it fails to fetch a seq from the first database in the list, it will try the second and so on. Bio::DB::Failover ISA Bio::DB::RandomAccessI while Bio::DB::Flat::BinarySearch ISA Bio::DB::SeqI. In the first case an implementation of get_all_primary_ids is not necessary, in contrast to the latter case. So, you might think of using Bio::DB::Flat::BinarySearch directly if you depend on that method and you need only 1 database. Hope it all makes sense cos I don't have much experience with these modules. Cheers, Marc > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of > Daniel Lang > Sent: Wednesday, January 04, 2006 11:01 AM > To: Bioperl list > Subject: [Bioperl-l] Bio::DB::Registry get_all_primary_ids > > Hi, > > I stumbled over a problem in the Registry modules: > I wanted to use the method get_all_primary_ids as laid out in > Bio::DB::SeqI. > The Bio::DB::Failover object as returned by the get_database > method doesn't support this method while the (in my case) > attached Bio::DB::Flat::BinarySearch object does! > How can I comfortably call this method on the object? > > For now I'm using something "ugly" like: > $db->{_database}->[0]->get_all_primary_ids > > Here is my object dump (Its describing a local sprot.dat): > Bio::DB::Failover=HASH(0x87ca9a0) > '_database' => ARRAY(0x8b721a4) > 0 Bio::DB::Flat::BinarySearch=HASH(0x8bbacdc) > '_dbfile' => HASH(0x8d02f54) > '/home/lang/projects/core_ortho/uniprot_sprot.dat' => 0 > '_file' => HASH(0x8bbaf70) > 0 => '/home/lang/projects/core_ortho/uniprot_sprot.dat' > '_fileid' => HASH(0x8d13ab4) > 0 => GLOB(0x8d42adc) > -> *Bio::DB::Flat::BinarySearch::$fh > > FileHandle({*Bio::DB::Flat::BinarySearch::$fh}) => > fileno(9) > '_index_directory' => '/home/lang/projects/core_ortho/' > '_index_type' => 'flat' > '_index_version' => 1 > '_primary_index_handle' => GLOB(0x8d13db4) > -> *Bio::DB::Flat::BinarySearch::$__ANONIO__ > > FileHandle({*Bio::DB::Flat::BinarySearch::$__ANONIO__}) => fileno(8) > '_primary_namespace' => undef > '_record_size' => 29 > '_root_verbose' => 0 > '_size' => HASH(0x8d42e48) > 0 => 796304807 > '_start_pos' => 4 > 'flat_dbname' => 'test' > 'format' => 'swiss' > 'primary_pattern' => undef > 'secondary_namespaces' => ARRAY(0x8b8f774) > 0 'ACC' > 'start_pattern' => undef > > As far as I understand it, the corresponding method needs to > be implemented in Failover? > > In addition I'm somewhat confused by the term "Failover" - > but since the retrieval is working like its supposed to... > > Thanks in advance, > Daniel > > -- > > Daniel Lang > University of Freiburg, Plant Biotechnology Schaenzlestr. 1, > D-79104 Freiburg > fax: +49 761 203 6945 > phone: +49 761 203 6974 > homepage: http://www.plant-biotech.net/ > e-mail: daniel.lang@biologie.uni-freiburg.de > > ################################################# > My software never has bugs. > It just develops random features. > ################################################# > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From andreo_beck at yahoo.com Wed Jan 4 17:35:39 2006 From: andreo_beck at yahoo.com (Andreo Beck) Date: Wed Jan 4 17:38:57 2006 Subject: [Bioperl-l] bioperl script error Message-ID: <20060104223539.74182.qmail@web37107.mail.mud.yahoo.com> Hi Bioperl, Any clue what may be the cause of this exception? The sub-sequence seems to be within the valid range. ------------- EXCEPTION ------------- MSG: Undefined sub-sequence (75,77). Valid range = 17 - 77 STACK Bio::Search::HSP::HSPI::matches /cmpchome/andyb/lib/perl/Bio/Search/HSP/HSPI.pm:711 STACK (eval) /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:365 STACK Bio::Search::SearchUtils::_adjust_contigs /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:364 STACK Bio::Search::SearchUtils::tile_hsps /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:176 STACK Bio::Search::Hit::GenericHit::length_aln /cmpchome/andyb/lib/perl/Bio/Search/Hit/GenericHit.pm:740 STACK toplevel /var/spool/slurmd/job01852/script:30 Andy --------------------------------- Yahoo! DSL Something to write home about. Just $16.99/mo. or less From torsten.seemann at infotech.monash.edu.au Wed Jan 4 18:33:33 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed Jan 4 18:30:47 2006 Subject: [Bioperl-l] bioperl script error In-Reply-To: <20060104223539.74182.qmail@web37107.mail.mud.yahoo.com> References: <20060104223539.74182.qmail@web37107.mail.mud.yahoo.com> Message-ID: <43BC5B4D.4000403@infotech.monash.edu.au> Andy, > Any clue what may be the cause of this exception? The sub-sequence seems to be within the valid range. > ------------- EXCEPTION ------------- > MSG: Undefined sub-sequence (75,77). Valid range = 17 - 77 > STACK Bio::Search::HSP::HSPI::matches /cmpchome/andyb/lib/perl/Bio/Search/HSP/HSPI.pm:711 > STACK (eval) /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:365 > STACK Bio::Search::SearchUtils::_adjust_contigs /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:364 > STACK Bio::Search::SearchUtils::tile_hsps /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:176 > STACK Bio::Search::Hit::GenericHit::length_aln /cmpchome/andyb/lib/perl/Bio/Search/Hit/GenericHit.pm:740 > STACK toplevel /var/spool/slurmd/job01852/script:30 What type of report are you processing? eg. [T]BLAST[NX], BLASTP, HMMER Here's the problem causing function in bioperl-live: http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/HSPI.html#CODE22 -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From andreo_beck at yahoo.com Wed Jan 4 18:58:06 2006 From: andreo_beck at yahoo.com (Andreo Beck) Date: Wed Jan 4 19:01:23 2006 Subject: [Bioperl-l] bioperl script error In-Reply-To: <43BC5B4D.4000403@infotech.monash.edu.au> Message-ID: <20060104235806.85639.qmail@web37114.mail.mud.yahoo.com> Great Torsten. But I wonder where this function is being called. Also even if it is called when my code is supplying out of range coordinates? I'm using BLASTP. Torsten Seemann wrote: Andy, > Any clue what may be the cause of this exception? The sub-sequence seems to be within the valid range. > ------------- EXCEPTION ------------- > MSG: Undefined sub-sequence (75,77). Valid range = 17 - 77 > STACK Bio::Search::HSP::HSPI::matches /cmpchome/andyb/lib/perl/Bio/Search/HSP/HSPI.pm:711 > STACK (eval) /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:365 > STACK Bio::Search::SearchUtils::_adjust_contigs /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:364 > STACK Bio::Search::SearchUtils::tile_hsps /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:176 > STACK Bio::Search::Hit::GenericHit::length_aln /cmpchome/andyb/lib/perl/Bio/Search/Hit/GenericHit.pm:740 > STACK toplevel /var/spool/slurmd/job01852/script:30 What type of report are you processing? eg. [T]BLAST[NX], BLASTP, HMMER Here's the problem causing function in bioperl-live: http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/HSPI.html#CODE22 -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ --------------------------------- Yahoo! DSL Something to write home about. Just $16.99/mo. or less From jason.stajich at duke.edu Wed Jan 4 19:20:03 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 4 19:16:47 2006 Subject: [Bioperl-l] bioperl script error In-Reply-To: <20060104235806.85639.qmail@web37114.mail.mud.yahoo.com> References: <20060104235806.85639.qmail@web37114.mail.mud.yahoo.com> Message-ID: <3A220694-BB16-4B1F-9DA2-C1A3215AE1BB@duke.edu> What does line 30 in your script say? >> STACK toplevel /var/spool/slurmd/job01852/script:30 It is trying to merge HSPs to compute a the virtual length of a hit from the sub-HSPs. This happens when you call a function on the HitI object like start/end/strand that has to merge the HSPs in order to get some sort of overall start/end/length for an alignment. I don't really know what is going on in those Hit functions but they shouldn't crash if the HSP path set is unresolveable. (I bet you are thinking the same thing...) Are you sure you want to be calling whatever function it is on line 30? -jason On Jan 4, 2006, at 6:58 PM, Andreo Beck wrote: > Great Torsten. But I wonder where this function is being called. > Also even if it is called when my code is supplying out of range > coordinates? > I'm using BLASTP. > > Torsten Seemann wrote: Andy, > >> Any clue what may be the cause of this exception? The sub-sequence >> seems to be within the valid range. >> ------------- EXCEPTION ------------- >> MSG: Undefined sub-sequence (75,77). Valid range = 17 - 77 >> STACK Bio::Search::HSP::HSPI::matches /cmpchome/andyb/lib/perl/Bio/ >> Search/HSP/HSPI.pm:711 >> STACK (eval) /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:365 >> STACK Bio::Search::SearchUtils::_adjust_contigs /cmpchome/andyb/ >> lib/perl/Bio/Search/SearchUtils.pm:364 >> STACK Bio::Search::SearchUtils::tile_hsps /cmpchome/andyb/lib/perl/ >> Bio/Search/SearchUtils.pm:176 >> STACK Bio::Search::Hit::GenericHit::length_aln /cmpchome/andyb/lib/ >> perl/Bio/Search/Hit/GenericHit.pm:740 >> STACK toplevel /var/spool/slurmd/job01852/script:30 > > What type of report are you processing? > eg. [T]BLAST[NX], BLASTP, HMMER > > Here's the problem causing function in bioperl-live: > http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/HSPI.html#CODE22 > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > > > > > --------------------------------- > Yahoo! DSL Something to write home about. Just $16.99/mo. or less > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From andreo_beck at yahoo.com Wed Jan 4 20:13:20 2006 From: andreo_beck at yahoo.com (Andreo Beck) Date: Wed Jan 4 20:16:40 2006 Subject: [Bioperl-l] bioperl script error In-Reply-To: <3A220694-BB16-4B1F-9DA2-C1A3215AE1BB@duke.edu> Message-ID: <20060105011320.58595.qmail@web37109.mail.mud.yahoo.com> My line 30 has: $hit->hsp->evalue <= $threshold But strangely enough, when I ran it the last time it ran successfully. One point is that, in my line 32 I've added: if(($hit->length_aln() >= $sum_of_HSP_len && $hit->length()>= $hit_len )) Now its clear that this is the line that's creating the problems, more specifically the length_aln function! The calls as you see are length_aln --> tile_hsps --> _adjust_contigs-->matches. Can this have anything to do with overlaps in my data? Jason Stajich wrote: What does line 30 in your script say? >> STACK toplevel /var/spool/slurmd/job01852/script:30 It is trying to merge HSPs to compute a the virtual length of a hit from the sub-HSPs. This happens when you call a function on the HitI object like start/end/strand that has to merge the HSPs in order to get some sort of overall start/end/length for an alignment. I don't really know what is going on in those Hit functions but they shouldn't crash if the HSP path set is unresolveable. (I bet you are thinking the same thing...) Are you sure you want to be calling whatever function it is on line 30? -jason On Jan 4, 2006, at 6:58 PM, Andreo Beck wrote: > Great Torsten. But I wonder where this function is being called. > Also even if it is called when my code is supplying out of range > coordinates? > I'm using BLASTP. > > Torsten Seemann wrote: Andy, > >> Any clue what may be the cause of this exception? The sub-sequence >> seems to be within the valid range. >> ------------- EXCEPTION ------------- >> MSG: Undefined sub-sequence (75,77). Valid range = 17 - 77 >> STACK Bio::Search::HSP::HSPI::matches /cmpchome/andyb/lib/perl/Bio/ >> Search/HSP/HSPI.pm:711 >> STACK (eval) /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:365 >> STACK Bio::Search::SearchUtils::_adjust_contigs /cmpchome/andyb/ >> lib/perl/Bio/Search/SearchUtils.pm:364 >> STACK Bio::Search::SearchUtils::tile_hsps /cmpchome/andyb/lib/perl/ >> Bio/Search/SearchUtils.pm:176 >> STACK Bio::Search::Hit::GenericHit::length_aln /cmpchome/andyb/lib/ >> perl/Bio/Search/Hit/GenericHit.pm:740 >> STACK toplevel /var/spool/slurmd/job01852/script:30 > > What type of report are you processing? > eg. [T]BLAST[NX], BLASTP, HMMER > > Here's the problem causing function in bioperl-live: > http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/HSPI.html#CODE22 > > -- > Torsten Seemann > Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > > > > > --------------------------------- > Yahoo! DSL Something to write home about. Just $16.99/mo. or less > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l --------------------------------- Yahoo! DSL Something to write home about. Just $16.99/mo. or less From andreo_beck at yahoo.com Wed Jan 4 20:40:32 2006 From: andreo_beck at yahoo.com (Andreo Beck) Date: Wed Jan 4 20:43:49 2006 Subject: [Bioperl-l] bioperl script error In-Reply-To: <39D2FCB4-68FC-4916-B192-2C5FE5051AC7@duke.edu> Message-ID: <20060105014032.38798.qmail@web37108.mail.mud.yahoo.com> I've used both WU-BLASTP with -postsw as well as FASTA (WU-BLAST doesn't take other formats without transformations), And there is a sad part with the -postsw is that it doesn't work with BLASTN. Anyways, now there are 2 questions that I've: 1. _adjust_contigs documentation says that its in experimental stage. What future enhancements are planned? Any plans to handle gapped alignments? 2. is the tile_hsps calculating the total length of the alignment? Then is that the point where things are breaking? Jason Stajich wrote: On Jan 4, 2006, at 8:13 PM, Andreo Beck wrote: My line 30 has: $hit->hsp->evalue <= $threshold But strangely enough, when I ran it the last time it ran successfully. One point is that, in my line 32 I've added: ??? ?????? if(($hit->length_aln() >= $sum_of_HSP_len && $hit->length()>= $hit_len )) Now its clear that this is the line that's creating the problems, more specifically the length_aln function! The calls as you see are length_aln --> tile_hsps --> _adjust_contigs-->matches. Can this have anything to do with overlaps in my data? Absolutely.??? That is the function causing you problems, as to why it is failing, I am guessing it can't figure out how to merge the HSPs.??? Personally I leave this sort of thing to the alignment rather than post-processing what BLAST did.??? Depending on your question/compute/time you can get a better HSP path by running??? * WU-BLAST with the -postsw option which will cleanup these overlapping alignments * FASTA or???SSEARCH to get a single alignment path. Jason Stajich wrote: What does line 30 in your script say? >> STACK toplevel /var/spool/slurmd/job01852/script:30 It is trying to merge HSPs to compute a the virtual length of a hit from the sub-HS! Ps. This happens when you call a function on the HitI object like start/end/strand that has to merge the HSPs in order to get some sort of overall start/end/length for an alignment. I don't really know what is going on in those Hit functions but they shouldn't crash if the HSP path set is unresolveable. (I bet you are thinking the same thing...) Are you sure you want to be calling whatever function it is on line 30? -jason On Jan 4, 2006, at 6:58 PM, Andreo Beck wrote: > Great Torsten. But I wonder where this function is being called. > Also even if it is called when my code is supplying out of range > coordinates? > I'm using BLASTP. > > Torsten Seemann wrote: Andy, > >> Any clue what may be the cause of this exception? The sub-sequence >> seems to be within the valid range. >> ------------- EXCEPTION ------------- &! gt;> MSG: Undefined sub-sequence (75,77). Valid range = 17 - 77 >> STACK Bio::Search::HSP::HSPI::matches /cmpchome/andyb/lib/perl/Bio/ >> Search/HSP/HSPI.pm:711 >> STACK (eval) /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:365 >> STACK Bio::Search::SearchUtils::_adjust_contigs /cmpchome/andyb/ >> lib/perl/Bio/Search/SearchUtils.pm:364 >> STACK Bio::Search::SearchUtils::tile_hsps /cmpchome/andyb/lib/perl/ >> Bio/Search/SearchUtils.pm:176 >> STACK Bio::Search::Hit::GenericHit::length_aln /cmpchome/andyb/lib/ >> perl/Bio/Search/Hit/GenericHit.pm:740 >> STACK toplevel /var/spool/slurmd/job01852/script:30 > > What type of report are you processing? > eg. [T]BLAST[NX], BLASTP, HMMER > > Here's the problem causing function in bioperl-live: > http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/HSPI.html#CODE22 > > -- > Torsten Seemann> Victorian Bioinformatics Consortium, Monash University, Australia > http://www.vicbioinformatics.com/ > > > > > --------------------------------- > Yahoo! DSL Something to write home about. Just $16.99/mo. or less > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l --------------------------------- Yahoo! DSL Something to write home about. Just $16.99/mo. or less -- Jason Stajich Duke University http://www.duke.edu/~jes12 --------------------------------- Yahoo! DSL Something to write home about. Just $16.99/mo. or less From jason.stajich at duke.edu Wed Jan 4 20:17:35 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 4 20:45:19 2006 Subject: [Bioperl-l] bioperl script error In-Reply-To: <20060105011320.58595.qmail@web37109.mail.mud.yahoo.com> References: <20060105011320.58595.qmail@web37109.mail.mud.yahoo.com> Message-ID: <39D2FCB4-68FC-4916-B192-2C5FE5051AC7@duke.edu> On Jan 4, 2006, at 8:13 PM, Andreo Beck wrote: > My line 30 has: > > $hit->hsp->evalue <= $threshold > > But strangely enough, when I ran it the last time it ran successfully. > One point is that, in my line 32 I've added: > > if(($hit->length_aln() >= $sum_of_HSP_len && $hit->length()>= > $hit_len )) > > > Now its clear that this is the line that's creating the problems, > more specifically the length_aln function! The calls as you see are > length_aln --> tile_hsps --> _adjust_contigs-->matches. Can this > have anything to do with overlaps in my data? > Absolutely. That is the function causing you problems, as to why it is failing, I am guessing it can't figure out how to merge the HSPs. Personally I leave this sort of thing to the alignment rather than post-processing what BLAST did. Depending on your question/compute/ time you can get a better HSP path by running * WU-BLAST with the -postsw option which will cleanup these overlapping alignments * FASTA or SSEARCH to get a single alignment path. > > > Jason Stajich wrote: > What does line 30 in your script say? > >> STACK toplevel /var/spool/slurmd/job01852/script:30 > > It is trying to merge HSPs to compute a the virtual length of a hit > from the sub-HS! Ps. This happens when you call a function on the HitI > object like start/end/strand that has to merge the HSPs in order to > get some sort of overall start/end/length for an alignment. I don't > really know what is going on in those Hit functions but they > shouldn't crash if the HSP path set is unresolveable. (I bet you are > thinking the same thing...) Are you sure you want to be calling > whatever function it is on line 30? > > -jason > On Jan 4, 2006, at 6:58 PM, Andreo Beck wrote: > > > Great Torsten. But I wonder where this function is being called. > > Also even if it is called when my code is supplying out of range > > coordinates? > > I'm using BLASTP. > > > > Torsten Seemann wrote: Andy, > > > >> Any clue what may be the cause of this exception? The sub-sequence > >> seems to be within the valid range. > >> ------------- EXCEPTION ------------- > &! gt;> MSG: Undefined sub-sequence (75,77). Valid range = 17 - 77 > >> STACK Bio::Search::HSP::HSPI::matches /cmpchome/andyb/lib/perl/Bio/ > >> Search/HSP/HSPI.pm:711 > >> STACK (eval) /cmpchome/andyb/lib/perl/Bio/Search/SearchUtils.pm:365 > >> STACK Bio::Search::SearchUtils::_adjust_contigs /cmpchome/andyb/ > >> lib/perl/Bio/Search/SearchUtils.pm:364 > >> STACK Bio::Search::SearchUtils::tile_hsps /cmpchome/andyb/lib/perl/ > >> Bio/Search/SearchUtils.pm:176 > >> STACK Bio::Search::Hit::GenericHit::length_aln /cmpchome/andyb/lib/ > >> perl/Bio/Search/Hit/GenericHit.pm:740 > >> STACK toplevel /var/spool/slurmd/job01852/script:30 > > > > What type of report are you processing? > > eg. [T]BLAST[NX], BLASTP, HMMER > > > > Here's the problem causing function in bioperl-live: > > http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/HSPI.html#CODE22 > > > > -- > > Torsten Seemann> Victorian Bioinformatics Consortium, Monash > University, Australia > > http://www.vicbioinformatics.com/ > > > > > > > > > > --------------------------------- > > Yahoo! DSL Something to write home about. Just $16.99/mo. or less > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > Yahoo! DSL Something to write home about. Just $16.99/mo. or less -- Jason Stajich Duke University http://www.duke.edu/~jes12 From gcoppola at ucla.edu Wed Jan 4 20:32:15 2006 From: gcoppola at ucla.edu (Giovanni Coppola) Date: Wed Jan 4 21:01:33 2006 Subject: [Bioperl-l] matrix from bl2seq Message-ID: <1FF5EF87-ECA4-432B-8C4A-D6729D047C5B@ucla.edu> Hi everybody, I have a number of nucleotide sequences and I would like to check them for homology to each other. I'd like to build a square matrix with the sequence names as rows and columns, and (as value) a measure of homology from the bl2seq output. One way to do this could be: 1) download Blast 2) use BlastStandAlone iteratively to get all the possible combinations 3) parse with BPbl2seq 4) build the matrix with the parameter I want I just wanted to check if this is the way to go... Thank you very much and happy new year! Giovanni From marc.saric at gmx.de Thu Jan 5 08:26:27 2006 From: marc.saric at gmx.de (Marc Saric) Date: Thu Jan 5 08:29:51 2006 Subject: [Bioperl-l] BioSQL, bioperl-db and UniGene Message-ID: <43BD1E83.5050408@gmx.de> Hi all, I've got some questions regarding BioSQL I would like to ask here: I am currently writing an app which should map microarray probe sequences to target sequences. It should do so in a generalized manner (i.e. any microarray against an arbitrary sequence-database). Currently I need UniGene for Zebrafish (Dr.*) and several Oligonucleotide libs, among them an Affymetrix array. Due to the fact, that UniGene is a moving target (especially for unfinished genomes) it would be good to do the mapping in a fully automated way. I am thinking about doing sequence-based mapping of probe-sequences with BLAT or GMAP (like ProbeLynx does for Ensembl/TIGR-based data, but unfortunately that tool is quite hard to port/extend for other databases). In addition I would like to have annotation based mapping (i.e. take the accession from the vendor-provided mapping and have a look to which UniGene-cluster it maps) as a fallback/second option for microarrays, where probe sequences are not published. I have installed/setup Bioperl 1.5.1 and the CVS-versions of biosql and bioperl-db with MySQL 4.1.12/Mac OS X and was able to load Taxon- and UniGene-data from flatfiles, at least the Cluster-IDs and Accessions as available from the *.data file. I was also able to rewrite microarray probes from various tab-delimited formats or FASTA to Genbank, which worked ok for loading (albeit slow, but...). (I hope you are still with me after this lengthy intro... :-) ) 1st question: Due to the fact that the loader does not like raw FASTA-files, what would be the most elegant/efficient way of loading all sequence-files for the UniGene build as well (normaly provided in a FASTA-file called *.seq.all, Dr.seq.all in my case). And how to associate them with the cluster data (i.e. there are allready entries in bioentry for all sequences, but they are missing the sequence data and most of their detail annotation, so this might be some kind of update). 2nd question: What would be the best way of integrating BLAT/GMAP (same format as BLAT) results. I'm thinking about parsing the file and writing the mapping-results as a annotation into the database, linked to each probe-sequence. Data would include the hit(s) found for each probe, wether it hits more than one cluster and possibly some additional notes. >From there I would write out a report or custom sequence file for use in other tools. If possible I would also like to accumulate annotations (like mapping against different UniGene builds over time). 3rd question: Due to the fact, that UniGene changes frequently, I would like to have some kind of versioning, so that I can keep old versions of UniGene as a backup and add new ones (i.e. not only keeping the mapping results but also keeping all the source sequences). If I understand it right, the load_seqdatabase script does not support this and has no (command-line) option for overriding the "database" name (i.e. for UniGene it will always be set to "UniGene" in biodatabase and thus overwrite old versions)? Do you see any fundamental problems here for versioning the data (except storage space)? Thanks in advance. Links: ProbeLynx http://koch.pathogenomics.ca/probelynx/ D.rerio UniGene: http://www.ncbi.nlm.nih.gov/UniGene/UGOrg.cgi?TAXID=7955 -- Bye, Marc Saric From sdavis2 at mail.nih.gov Thu Jan 5 09:32:04 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Jan 5 10:01:42 2006 Subject: [Bioperl-l] BioSQL, bioperl-db and UniGene In-Reply-To: <43BD1E83.5050408@gmx.de> Message-ID: Hi Marc. I currently do something similar for all our arrays (about 7 different platforms, three species, depending on need) at NHGRI/NIMH/NINDS. I use blat to map oligo sequences to refseq, ensembl, unigene_unique (the single "best" sequence for a unigene cluster), UCSC known genes, and Human Invitational as well as to several genome builds for each species. I run blat locally and then load all the blat results into one large database table (about 5 million rows in the current build). I also have an annotation database that includes Entrez Gene, refseq, ensembl, unigene, Human Invitational, UCSC knownGene, gene ontology, homologene, and a few other things. After doing the blats, I then choose the best hit for each transcript database and map that to an associated gene model using the annotation database. I end up with oligos mapped to zero to many transcripts for all large transcript databases, oligos mapped to zero to many genes (and local storage of all the gene objects and associated information for easy access), as well as mappings to multiple sources of metadata. Doing the blats for all these is quite fast (but DO NOT plan on using bioperl to parse the 5M blat results. Doing so will take DAYS). Note that the process does not include storing all the sequences in the database--there isn't a need for doing so if you are just blatting. Also, I do not use biosql in this situation because I found it rather slow for mapping between different entities. It did require building a database of my own, but doing so makes it fairly easy to add tables as needed to support another public database or to support a website, for example. If you don't want to build your own annotation database (the largest part of doing what I have been doing), you can use one of several available including GeneKeyDB (by our own Stefan Kirov) or Dragon DB. Let me know if I can be of more help. Sean On 1/5/06 8:26 AM, "Marc Saric" wrote: > Hi all, > > I've got some questions regarding BioSQL I would like to ask here: > > I am currently writing an app which should map microarray probe > sequences to target sequences. It should do so in a generalized manner > (i.e. any microarray against an arbitrary sequence-database). Currently > I need UniGene for Zebrafish (Dr.*) and several Oligonucleotide libs, > among them an Affymetrix array. > > Due to the fact, that UniGene is a moving target (especially for > unfinished genomes) it would be good to do the mapping in a fully > automated way. > > I am thinking about doing sequence-based mapping of probe-sequences with > BLAT or GMAP (like ProbeLynx does for Ensembl/TIGR-based data, but > unfortunately that tool is quite hard to port/extend for other databases). > > In addition I would like to have annotation based mapping (i.e. take the > accession from the vendor-provided mapping and have a look to which > UniGene-cluster it maps) as a fallback/second option for microarrays, > where probe sequences are not published. > > I have installed/setup Bioperl 1.5.1 and the CVS-versions of biosql and > bioperl-db with MySQL 4.1.12/Mac OS X and was able to load Taxon- and > UniGene-data from flatfiles, at least the Cluster-IDs and Accessions as > available from the *.data file. > > I was also able to rewrite microarray probes from various tab-delimited > formats or FASTA to Genbank, which worked ok for loading (albeit slow, > but...). > > (I hope you are still with me after this lengthy intro... :-) ) > > 1st question: > > Due to the fact that the loader does not like raw FASTA-files, what > would be the most elegant/efficient way of loading all sequence-files > for the UniGene build as well (normaly provided in a FASTA-file called > *.seq.all, Dr.seq.all in my case). And how to associate them with the > cluster data (i.e. there are allready entries in bioentry for all > sequences, but they are missing the sequence data and most of their > detail annotation, so this might be some kind of update). > > 2nd question: > > What would be the best way of integrating BLAT/GMAP (same format as > BLAT) results. I'm thinking about parsing the file and writing the > mapping-results as a annotation into the database, linked to each > probe-sequence. Data would include the hit(s) found for each probe, > wether it hits more than one cluster and possibly some additional notes. > >> From there I would write out a report or custom sequence file for use in > other tools. > > If possible I would also like to accumulate annotations (like mapping > against different UniGene builds over time). > > 3rd question: > > Due to the fact, that UniGene changes frequently, I would like to have > some kind of versioning, so that I can keep old versions of UniGene as a > backup and add new ones (i.e. not only keeping the mapping results but > also keeping all the source sequences). > > If I understand it right, the load_seqdatabase script does not support > this and has no (command-line) option for overriding the "database" name > (i.e. for UniGene it will always be set to "UniGene" in biodatabase and > thus overwrite old versions)? > > Do you see any fundamental problems here for versioning the data (except > storage space)? > > Thanks in advance. > > Links: > > ProbeLynx http://koch.pathogenomics.ca/probelynx/ > D.rerio UniGene: http://www.ncbi.nlm.nih.gov/UniGene/UGOrg.cgi?TAXID=7955 > From hlapp at gmx.net Thu Jan 5 14:02:38 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jan 5 14:06:01 2006 Subject: [Bioperl-l] BioSQL, bioperl-db and UniGene In-Reply-To: <43BD1E83.5050408@gmx.de> References: <43BD1E83.5050408@gmx.de> Message-ID: <337e07f0c178c270d5fafdac45e80f80@gmx.net> On Jan 5, 2006, at 5:26 AM, Marc Saric wrote: > I am currently writing an app which should map microarray probe > sequences to target sequences. It should do so in a generalized manner > (i.e. any microarray against an arbitrary sequence-database). Currently > I need UniGene for Zebrafish (Dr.*) and several Oligonucleotide libs, > among them an Affymetrix array. First off, you have seen the TIGR RESOURCERER application (http://www.tigr.org/tigr-scripts/magic/r1.pl), right? > [...] > 1st question: > > Due to the fact that the loader does not like raw FASTA-files, The loader likes all formats that Bio::SeqIO likes, so it doesn't harbor any disdain for FASTA format. The only problem is that FASTA format doesn't designate fields for accession, version, and name but rather leaves it up to the file producer. This can be easily solved by writing a custom SeqProcessor as pointed out several times before, for instance: http://portal.open-bio.org/pipermail/bioperl-l/2004-June/016204.html http://portal.open-bio.org/pipermail/bioperl-l/2005-August/019579.html > what > would be the most elegant/efficient way of loading all sequence-files > for the UniGene build as well (normaly provided in a FASTA-file called > *.seq.all, Dr.seq.all in my case). And how to associate them with the > cluster data (i.e. there are allready entries in bioentry for all > sequences, but they are missing the sequence data and most of their > detail annotation, so this might be some kind of update). See above for the format issue. As for automatically updating your sequences, use --lookup and possibly other update-related options for load_seqdatabase.pl (see its POD). > > 2nd question: > > What would be the best way of integrating BLAT/GMAP (same format as > BLAT) results. I'm thinking about parsing the file and writing the > mapping-results as a annotation into the database, linked to each > probe-sequence. Data would include the hit(s) found for each probe, > wether it hits more than one cluster and possibly some additional > notes. > >> From there I would write out a report or custom sequence file for use >> in > other tools. > > If possible I would also like to accumulate annotations (like mapping > against different UniGene builds over time). I'm not sure exactly what your question is. Note that you can attach anything you like to sequences in the database, e.g., features, and annotations. You can do so using Bioperl pretty easily. The sequence of steps is basically, 1) retrieve sequence object, 2) add annotation and/or features, 3) call $pseq->store(), and commit with $pseq->commit(). There are some pertinent code fragments in http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf Let me know if this doesn't answer your question. > > 3rd question: > > Due to the fact, that UniGene changes frequently, I would like to have > some kind of versioning, so that I can keep old versions of UniGene as > a > backup and add new ones (i.e. not only keeping the mapping results but > also keeping all the source sequences). > > If I understand it right, the load_seqdatabase script does not support > this and has no (command-line) option for overriding the "database" > name > (i.e. for UniGene it will always be set to "UniGene" in biodatabase and > thus overwrite old versions)? Yes - the reason is that an instance of Bio::Cluster::Unigene will default its namespace to 'UniGene' if none if provided by the caller, and the Unigene parser doesn't provide one. load_seqdatabase itself doesn't touch the namespace of the object if its been set already. I'm not quite happy with this myself, as basically it takes away control from the user. Now I do think load_seqdatabase.pl's policy is correct; but maybe the right thing to do for Bio::Cluster::Unigene is not to default to a non-mandatory value if none is provided. What if I just propose to make that change. What you can do regardless of this is before you want to load a new UniGene version rename the existing namespace to something that includes the version. Then all entries will be created fresh under the then-new namespace 'UniGene'. Note that source sequences do not change because UniGene changes - there will be new cluster members and other member sequences will be retired from the cluster, but their sequences only change if the respective GenBank sequence changes, which will not only increment the version but also lead to a new GI number, which basically means a new cluster member (as they are references by GI number). > > Do you see any fundamental problems here for versioning the data > (except > storage space)? No, not at all. Let me know if I didn't address your questions. -hilmar > > Thanks in advance. > > Links: > > ProbeLynx http://koch.pathogenomics.ca/probelynx/ > D.rerio UniGene: > http://www.ncbi.nlm.nih.gov/UniGene/UGOrg.cgi?TAXID=7955 > > > -- > Bye, > > Marc Saric > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cjfields at uiuc.edu Thu Jan 5 15:27:41 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Jan 5 15:34:39 2006 Subject: [Bioperl-l] PPM files for bioperl, bioperl-db Message-ID: <000001c61236$7b6af960$15327e82@pyrimidine> I have been toying around with creating PPM archives (Perl 5.8) for Activestate Perl on WinXP and managed to get PPMs for bioperl-live and bioperl-db installed through PPM3. I haven't tested them out yet, but I'm trying to gauge the interest in maybe making them available for use with somewhat regular updates (weekly or monthly). Also, the dependencies listed for bioperl included the following: DB-File File-Spec File-Temp IO-String HTML-Entities IO-Scalar All but HTML=Entities and IO-Scalar are available for Win32. Anyone know what the missing modules are used for? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Thu Jan 5 15:50:22 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu Jan 5 15:52:02 2006 Subject: [Bioperl-l] PPM files for bioperl, bioperl-db In-Reply-To: <000001c61236$7b6af960$15327e82@pyrimidine> Message-ID: Chris, It looks like IO::Scalar is no longer used, it _was_ used by Bio::Tools::Blast. HTML::Entities is used by Bio::SearchIO::blastxml. Brian O. On 1/5/06 3:27 PM, "Chris Fields" wrote: > I have been toying around with creating PPM archives (Perl 5.8) for > Activestate Perl on WinXP and managed to get PPMs for bioperl-live and > bioperl-db installed through PPM3. I haven't tested them out yet, but I'm > trying to gauge the interest in maybe making them available for use with > somewhat regular updates (weekly or monthly). > > Also, the dependencies listed for bioperl included the following: > > DB-File > File-Spec > File-Temp > IO-String > HTML-Entities > IO-Scalar > > All but HTML=Entities and IO-Scalar are available for Win32. Anyone know > what the missing modules are used for? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Jan 5 15:57:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Jan 5 15:53:54 2006 Subject: [Bioperl-l] PPM files for bioperl, bioperl-db In-Reply-To: Message-ID: <000101c6123a$9da46e40$15327e82@pyrimidine> My bad. I checked through CPAN and the Activestate website for those last two. HTML::Entities is included with HTML::Parser, which is part of the Activestate Perl core. IO::Scalar is included with IO::Stringy, which is available. If I set up the PPD file, should I remove all the dependencies since not all of them are needed, or leave the ones that matter (DB_File, etc) in? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign -----Original Message----- From: Brian Osborne [mailto:osborne1@optonline.net] Sent: Thursday, January 05, 2006 2:50 PM To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] PPM files for bioperl, bioperl-db Chris, It looks like IO::Scalar is no longer used, it _was_ used by Bio::Tools::Blast. HTML::Entities is used by Bio::SearchIO::blastxml. Brian O. On 1/5/06 3:27 PM, "Chris Fields" wrote: > I have been toying around with creating PPM archives (Perl 5.8) for > Activestate Perl on WinXP and managed to get PPMs for bioperl-live and > bioperl-db installed through PPM3. I haven't tested them out yet, but I'm > trying to gauge the interest in maybe making them available for use with > somewhat regular updates (weekly or monthly). > > Also, the dependencies listed for bioperl included the following: > > DB-File > File-Spec > File-Temp > IO-String > HTML-Entities > IO-Scalar > > All but HTML=Entities and IO-Scalar are available for Win32. Anyone know > what the missing modules are used for? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Thu Jan 5 16:11:50 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu Jan 5 16:08:34 2006 Subject: [Bioperl-l] BioSQL, bioperl-db and UniGene In-Reply-To: <337e07f0c178c270d5fafdac45e80f80@gmx.net> Message-ID: On 1/5/06 2:02 PM, "Hilmar Lapp" wrote: > > On Jan 5, 2006, at 5:26 AM, Marc Saric wrote: > >> I am currently writing an app which should map microarray probe >> sequences to target sequences. It should do so in a generalized manner >> (i.e. any microarray against an arbitrary sequence-database). Currently >> I need UniGene for Zebrafish (Dr.*) and several Oligonucleotide libs, >> among them an Affymetrix array. > > First off, you have seen the TIGR RESOURCERER application > (http://www.tigr.org/tigr-scripts/magic/r1.pl), right? And, since we started out talking about microarrays, are you aware of the BioConductor project and their annotation efforts, as well as a connection to Resourcerer? Sean From osborne1 at optonline.net Thu Jan 5 17:01:36 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu Jan 5 17:03:34 2006 Subject: [Bioperl-l] PPM files for bioperl, bioperl-db In-Reply-To: <000101c6123a$9da46e40$15327e82@pyrimidine> Message-ID: Chris, In my opinion you should include anything that's listed in the INSTALL file in your PPD file, I _think_ this is what previous PPD creators did. I may not have addressed your question... Brian O. On 1/5/06 3:57 PM, "Chris Fields" wrote: > My bad. I checked through CPAN and the Activestate website for those last > two. HTML::Entities is included with HTML::Parser, which is part of the > Activestate Perl core. IO::Scalar is included with IO::Stringy, which is > available. If I set up the PPD file, should I remove all the dependencies > since not all of them are needed, or leave the ones that matter (DB_File, > etc) in? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > -----Original Message----- > From: Brian Osborne [mailto:osborne1@optonline.net] > Sent: Thursday, January 05, 2006 2:50 PM > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] PPM files for bioperl, bioperl-db > > Chris, > > It looks like IO::Scalar is no longer used, it _was_ used by > Bio::Tools::Blast. HTML::Entities is used by Bio::SearchIO::blastxml. > > Brian O. > > > On 1/5/06 3:27 PM, "Chris Fields" wrote: > >> I have been toying around with creating PPM archives (Perl 5.8) for >> Activestate Perl on WinXP and managed to get PPMs for bioperl-live and >> bioperl-db installed through PPM3. I haven't tested them out yet, but I'm >> trying to gauge the interest in maybe making them available for use with >> somewhat regular updates (weekly or monthly). >> >> Also, the dependencies listed for bioperl included the following: >> >> DB-File >> File-Spec >> File-Temp >> IO-String >> HTML-Entities >> IO-Scalar >> >> All but HTML=Entities and IO-Scalar are available for Win32. Anyone know >> what the missing modules are used for? >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Thu Jan 5 18:08:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Jan 5 18:16:16 2006 Subject: [Bioperl-l] PPM files for bioperl, bioperl-db In-Reply-To: Message-ID: <000001c6124c$edd87c00$15327e82@pyrimidine> I guess what I mean is: when generating the PPD file (using 'nmake ppd' after archiving the blib directory), several dependencies are listed in the PPD file which come from the makefile. A number of those probably should be kept, such as DB_File, but a few are 'red herrings,' in that HTML::Parser is included in the ActivePerl Core. I believe the dependencies listed are for those who use CPAN directly; I don't think PPM can distinguish modules with one name (HTML::Entities) packaged together with others (HTML::Parser). What I'll do is remove those that are obviously not needed (IO::Scalar, HTML::Entitites), go through the rest, then proceed from there. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign -----Original Message----- From: Brian Osborne [mailto:osborne1@optonline.net] Sent: Thursday, January 05, 2006 4:02 PM To: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] PPM files for bioperl, bioperl-db Chris, In my opinion you should include anything that's listed in the INSTALL file in your PPD file, I _think_ this is what previous PPD creators did. I may not have addressed your question... Brian O. On 1/5/06 3:57 PM, "Chris Fields" wrote: > My bad. I checked through CPAN and the Activestate website for those last > two. HTML::Entities is included with HTML::Parser, which is part of the > Activestate Perl core. IO::Scalar is included with IO::Stringy, which is > available. If I set up the PPD file, should I remove all the dependencies > since not all of them are needed, or leave the ones that matter (DB_File, > etc) in? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > -----Original Message----- > From: Brian Osborne [mailto:osborne1@optonline.net] > Sent: Thursday, January 05, 2006 2:50 PM > To: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] PPM files for bioperl, bioperl-db > > Chris, > > It looks like IO::Scalar is no longer used, it _was_ used by > Bio::Tools::Blast. HTML::Entities is used by Bio::SearchIO::blastxml. > > Brian O. > > > On 1/5/06 3:27 PM, "Chris Fields" wrote: > >> I have been toying around with creating PPM archives (Perl 5.8) for >> Activestate Perl on WinXP and managed to get PPMs for bioperl-live and >> bioperl-db installed through PPM3. I haven't tested them out yet, but I'm >> trying to gauge the interest in maybe making them available for use with >> somewhat regular updates (weekly or monthly). >> >> Also, the dependencies listed for bioperl included the following: >> >> DB-File >> File-Spec >> File-Temp >> IO-String >> HTML-Entities >> IO-Scalar >> >> All but HTML=Entities and IO-Scalar are available for Win32. Anyone know >> what the missing modules are used for? >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From hz5 at njit.edu Fri Jan 6 00:27:36 2006 From: hz5 at njit.edu (hz5@njit.edu) Date: Fri Jan 6 01:06:49 2006 Subject: FW: [Bioperl-l] How to extract promoter region seq from genbank or another source? In-Reply-To: <43506D94.8030909@utk.edu> References: <43506D94.8030909@utk.edu> Message-ID: <1136525256.43bdffc8cf8e0@webmail.njit.edu> http://siriusb.umdnj.edu:18080/EZRetrieve/index.jsp Quoting Stefan Kirov : > Sam, > You can use MART to convert to ensembl id (in most cases). I don't think > > they support genebank. You can try to use genekeydb > (genereg.ornl.gov/gkdb), either download it or use the online converter, > > but my guess is you are not going to get too many ids. One thing I may > > fix in the future, but right now... Still may be worth a try. Look at > seqhound too (http://www.blueprint.org/seqhound/index.html). > Stefan > > Brian Osborne wrote: > > >ENSEMBL experts? > > > >------ Forwarded Message > >From: Sam Al-Droubi > >Date: Fri, 14 Oct 2005 14:05:38 -0700 (PDT) > >To: Brian Osborne > >Subject: Re: [Bioperl-l] How to extract promoter region seq from > genbank or > >another source? > > > >Hi Brian, > > > >Thank you for the response. I looked at it but it seems that enembl > does > >not use accession numbers. It seems that they have their own > numbering > >scheme. If so how do I get the mapping between the two. If I can't > get the > >promoter region sequence then do you know if there is a way I can get > the > >entire chromosome sequence? If so, I can then try to find the gene > within > >it and then grab the promoter region. > >I am new to all this so I am sorry if I sound ignorant in this area. > > > >On the surface, it seems that one should be able to do this easily but > it > >has not been easy so far. > > > >Thank you. > > > > > >Brian Osborne wrote: > > > > > >>Sam, > >> > >>ensembl may be one solution, I think it provides a good API for these > sorts > >>of queries. See the ensembl API documentation for more information > >>(http://www.ensembl.org/info/software/core/core_tutorial.html). > >> > >>Brian O. > >> > >> > >> > >>On 10/13/05 11:25 AM, "Sam Al-Droubi" wrote: > >> > >> > >> > >>>>Hello, > >>>> > >>>>I am totally new to BioPerl. I was able to install it and retrieve > data > >>>> > >>>> > >>>from > >>> > >>> > >>>>GenBank. I have a list of accession numbers for genes but I want to > use > >>>>BioPerl to get the promoter region (1000 bp before the start of the > gene). > >>>>Can someone point me in the right direction on how to accomplish > this. > >>>> > >>>>Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine. > >>>> > >>>>Thank you. > >>>> > >>>> > >>>> > >>>> > >>>>Sincerely, > >>>>Sam Al-Droubi, M.S. > >>>>saldroubi@yahoo.com > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l@portal.open-bio.org > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >> > >> > > > > > >Sincerely, > >Sam Al-Droubi, M.S. > >saldroubi@yahoo.com > > > >------ End of Forwarded Message > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Stefan Kirov, Ph.D. > University of Tennessee/Oak Ridge National Laboratory > 5700 bldg, PO BOX 2008 MS6164 > Oak Ridge TN 37831-6164 > USA > tel +865 576 5120 > fax +865-576-5332 > e-mail: skirov@utk.edu > sao@ornl.gov > > "And the wars go on with brainwashed pride > For the love of God and our human rights > And all these things are swept aside" > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ========================================================= Haibo Zhang, PhD Computational Biology http://www.cyberpostdoc.org/ Share postdoc information in cyberspace. Welcome your stories, suggestions and advice! From hz5 at njit.edu Fri Jan 6 00:30:43 2006 From: hz5 at njit.edu (hz5@njit.edu) Date: Fri Jan 6 01:08:28 2006 Subject: [Bioperl-l] Can't find gene sequence in choromosome sequence In-Reply-To: <20051016134237.67298.qmail@web34308.mail.mud.yahoo.com> References: <20051016134237.67298.qmail@web34308.mail.mud.yahoo.com> Message-ID: <1136525443.43be0083705b9@webmail.njit.edu> NM is mRNA, should be separated by intron on genomic sequences, did you consider this when you search? Quoting Sam Al-Droubi : > All, > > I downloaded the fasta sequence for a mouse gene from > genbank with accession number NM_01167. I also > downloaded the Mouse chromosome 3 fasta file from from > ncbi > (ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/Assembled_chromosomes/mm_chr3.fa. gz). > The problem is that I can not find the gene sequence > in chromosome sequence. I used Perl > index($chr_obj->seq,$seq_obj->seq) and I get -1, > meaning no match. I then searched by hand using grep > and emacs and to my surprise, the gene sequence is not > in the mm_chr3.fa file. What am I doing wrong? Do I > have the wrong chromosome file? I am positive that > this gene is in this chromosome according to genbank. > By the way, I am doing this so that I can extract the > promoter region right before the gene starts on the > chromosome. > > Thank you in advance. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi@yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ========================================================= Haibo Zhang, PhD Computational Biology http://www.cyberpostdoc.org/ Share postdoc information in cyberspace. Welcome your stories, suggestions and advice! From birney at ebi.ac.uk Fri Jan 6 03:05:18 2006 From: birney at ebi.ac.uk (Ewan Birney) Date: Fri Jan 6 03:21:22 2006 Subject: FW: [Bioperl-l] How to extract promoter region seq from genbank or another source? In-Reply-To: <1136525256.43bdffc8cf8e0@webmail.njit.edu> References: <43506D94.8030909@utk.edu> <1136525256.43bdffc8cf8e0@webmail.njit.edu> Message-ID: <43BE24BE.7020502@ebi.ac.uk> hz5@njit.edu wrote: > http://siriusb.umdnj.edu:18080/EZRetrieve/index.jsp > > Quoting Stefan Kirov : > >> Sam, >> You can use MART to convert to ensembl id (in most cases). I don't think >> >> they support genebank. You can try to use genekeydb >> (genereg.ornl.gov/gkdb), either download it or use the online converter, >> I know this is rather old, but at Ensembl we do, of course, track GenBank accession numbers - these are the identifiers shared with EMBL. We don't track GenBank gi numbers as they are too volatile. >> but my guess is you are not going to get too many ids. One thing I may >> >> fix in the future, but right now... Still may be worth a try. Look at >> seqhound too (http://www.blueprint.org/seqhound/index.html). >> Stefan >> >> Brian Osborne wrote: >> >>> ENSEMBL experts? >>> >>> ------ Forwarded Message >>> From: Sam Al-Droubi >>> Date: Fri, 14 Oct 2005 14:05:38 -0700 (PDT) >>> To: Brian Osborne >>> Subject: Re: [Bioperl-l] How to extract promoter region seq from >> genbank or >>> another source? >>> >>> Hi Brian, >>> >>> Thank you for the response. I looked at it but it seems that enembl >> does >>> not use accession numbers. It seems that they have their own >> numbering >>> scheme. If so how do I get the mapping between the two. If I can't >> get the >>> promoter region sequence then do you know if there is a way I can get >> the >>> entire chromosome sequence? If so, I can then try to find the gene >> within >>> it and then grab the promoter region. >>> I am new to all this so I am sorry if I sound ignorant in this area. >>> >>> On the surface, it seems that one should be able to do this easily but >> it >>> has not been easy so far. >>> >>> Thank you. >>> >>> >>> Brian Osborne wrote: >>> >>> >>>> Sam, >>>> >>>> ensembl may be one solution, I think it provides a good API for these >> sorts >>>> of queries. See the ensembl API documentation for more information >>>> (http://www.ensembl.org/info/software/core/core_tutorial.html). >>>> >>>> Brian O. >>>> >>>> >>>> >>>> On 10/13/05 11:25 AM, "Sam Al-Droubi" wrote: >>>> >>>> >>>> >>>>>> Hello, >>>>>> >>>>>> I am totally new to BioPerl. I was able to install it and retrieve >> data >>>>>> >>>>>> >>>>> from >>>>> >>>>> >>>>>> GenBank. I have a list of accession numbers for genes but I want to >> use >>>>>> BioPerl to get the promoter region (1000 bp before the start of the >> gene). >>>>>> Can someone point me in the right direction on how to accomplish >> this. >>>>>> Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Sincerely, >>>>>> Sam Al-Droubi, M.S. >>>>>> saldroubi@yahoo.com >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@portal.open-bio.org >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>> >>>> >>> >>> Sincerely, >>> Sam Al-Droubi, M.S. >>> saldroubi@yahoo.com >>> >>> ------ End of Forwarded Message >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Stefan Kirov, Ph.D. >> University of Tennessee/Oak Ridge National Laboratory >> 5700 bldg, PO BOX 2008 MS6164 >> Oak Ridge TN 37831-6164 >> USA >> tel +865 576 5120 >> fax +865-576-5332 >> e-mail: skirov@utk.edu >> sao@ornl.gov >> >> "And the wars go on with brainwashed pride >> For the love of God and our human rights >> And all these things are swept aside" >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > > > ========================================================= > Haibo Zhang, PhD > Computational Biology > http://www.cyberpostdoc.org/ > Share postdoc information in cyberspace. Welcome your stories, suggestions and > advice! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From khoueiry at ibdm.univ-mrs.fr Fri Jan 6 03:48:58 2006 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Fri Jan 6 03:42:20 2006 Subject: [Bioperl-l] counting and searching patterns In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA67469EE@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA67469EE@ANTARESIA.be.devgen.com> Message-ID: <1136537338.9062.19.camel@DavidLinux> thanks Marc, I forgot to look to the great wiki page so it helps me to keep it in mind. Actually, SeqPattern don't do the stuff neither the TFBS package that works for matrices and I'm forced to use a consensus. The idea to use fuzznuc from EMBOSS may be of importance for me and I'll see what i can do with it. regards Pierre On Wed, 2006-01-04 at 13:21 +0100, Marc Logghe wrote: > Hi Pierre, > Never used it myself, but can you do something with > Bio::Tools::SeqPattern ? Have a look at the FAQ: > http://bioperl.open-bio.org/wiki/FAQ#How_do_I_do_motif_searches_with_Bio > Perl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_ > given_motif.3F > > You could also do it with EMBOSS' fuzznuc: > On the command line you do: > fuzznuc -pattern MGGAAR -complement > If you need to automate this, you can launch fuzznuc with bioperl. > To do that you will need Bio::Factory::EMBOSS which is part of > bioperl-run. > HTH > Cheers, > Marc > > > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of khoueiry > > Sent: Wednesday, January 04, 2006 10:42 AM > > To: bioperl-l@bioperl.org > > Subject: [Bioperl-l] counting and searching patterns > > > > > > Hello, > > > > It's been a while that I'm working on an issue and the > > matter of finding the best way to do it is triggering me. > > Actually, I want to count/search a pattern in a nucleotide > > sequence. (i.e : search/count for MGGAAR). What I'm doing now > > is to Generates unique Seq objects using :IUPAC:module, then > > for each unique seq generates the reverse one using > > :SeqPattern: and going to count/search in my seq. > > > > > > i.e : MGGAAR -> C/AGGAAG/A (IUPAC) 4 possibilities + 4 > > reverse ( :SeqPattern:) = 8 . > > > > > > I was wondering, if there is a bioperl way to do the > > count/search directly using the initial pattern (MGGAAR) > > taking the reverse case into account (that is YTTCCK in my example). > > > > > > Any help or suggestions are appreciated > > > > > > Thanks to all and happy new year > > > > pierre > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From walsh at cenix-bioscience.com Fri Jan 6 03:35:57 2006 From: walsh at cenix-bioscience.com (Andrew Walsh) Date: Fri Jan 6 04:01:19 2006 Subject: [Bioperl-l] Can't find gene sequence in choromosome sequence In-Reply-To: <1136525443.43be0083705b9@webmail.njit.edu> References: <20051016134237.67298.qmail@web34308.mail.mud.yahoo.com> <1136525443.43be0083705b9@webmail.njit.edu> Message-ID: <43BE2BED.4050009@cenix-bioscience.com> If you look at the entry in the .gbs file (release 34.1), the exon coordinates for that mRNA are on the negative strand. Are you using the transcript sequence or the gene sequence? If you are using the gene sequence, reverse complementing should do the trick. If you are using the transcript sequence, this will not work since you are missing the introns. Andrew hz5@njit.edu wrote: > NM is mRNA, should be separated by intron on genomic sequences, did you > consider this when you search? > > Quoting Sam Al-Droubi : > > >>All, >> >>I downloaded the fasta sequence for a mouse gene from >>genbank with accession number NM_01167. I also >>downloaded the Mouse chromosome 3 fasta file from from >>ncbi >> > > (ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/Assembled_chromosomes/mm_chr3.fa. > gz). > >>The problem is that I can not find the gene sequence >>in chromosome sequence. I used Perl >>index($chr_obj->seq,$seq_obj->seq) and I get -1, >>meaning no match. I then searched by hand using grep >>and emacs and to my surprise, the gene sequence is not >>in the mm_chr3.fa file. What am I doing wrong? Do I >>have the wrong chromosome file? I am positive that >>this gene is in this chromosome according to genbank. >>By the way, I am doing this so that I can extract the >>promoter region right before the gene starts on the >>chromosome. >> >>Thank you in advance. >> >> >> >>Sincerely, >>Sam Al-Droubi, M.S. >>saldroubi@yahoo.com >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > > > > > ========================================================= > Haibo Zhang, PhD > Computational Biology > http://www.cyberpostdoc.org/ > Share postdoc information in cyberspace. Welcome your stories, suggestions and > advice! > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------ Andrew Walsh, M.Sc. Senior Bioinformatics Software Engineer IT Unit Cenix BioScience GmbH Tatzberg 47 01307 Dresden Germany Tel. +49-351-4173 137 Fax +49-351-4173 109 public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg ------------------------------------------------------------------ From sdavis2 at mail.nih.gov Fri Jan 6 06:34:19 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri Jan 6 06:31:37 2006 Subject: [Bioperl-l] Can't find gene sequence in choromosome sequence In-Reply-To: <43BE2BED.4050009@cenix-bioscience.com> Message-ID: On 1/6/06 3:35 AM, "Andrew Walsh" wrote: > If you look at the entry in the .gbs file (release 34.1), the exon > coordinates for that mRNA are on the negative strand. Are you using the > transcript sequence or the gene sequence? If you are using the gene > sequence, reverse complementing should do the trick. If you are using > the transcript sequence, this will not work since you are missing the > introns. Another possibility that is readily available and more robust is to use BLAT at the UCSC genome browser. It is really a pretty simple matter to drop this sequence into the UCSC genome browser and BLAT it. In addition to the complexities already noted, note that mRNA sequence does NOT necessarily match the associated genomic sequence base-for-base because of SNPs, lower quality sequence reads, etc. Finally, if you have the Accession (which you do), you could simply look that up at UCSC and get the (curated) results of the blat on the refseq track. Sean > > hz5@njit.edu wrote: >> NM is mRNA, should be separated by intron on genomic sequences, did you >> consider this when you search? >> >> Quoting Sam Al-Droubi : >> >> >>> All, >>> >>> I downloaded the fasta sequence for a mouse gene from >>> genbank with accession number NM_01167. I also >>> downloaded the Mouse chromosome 3 fasta file from from >>> ncbi >>> >> >> (ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/Assembled_chromosomes/mm_chr3. >> fa. >> gz). >> >>> The problem is that I can not find the gene sequence >>> in chromosome sequence. I used Perl >>> index($chr_obj->seq,$seq_obj->seq) and I get -1, >>> meaning no match. I then searched by hand using grep >>> and emacs and to my surprise, the gene sequence is not >>> in the mm_chr3.fa file. What am I doing wrong? Do I >>> have the wrong chromosome file? I am positive that >>> this gene is in this chromosome according to genbank. >>> By the way, I am doing this so that I can extract the >>> promoter region right before the gene starts on the >>> chromosome. >>> >>> Thank you in advance. >>> >>> >>> >>> Sincerely, >>> Sam Al-Droubi, M.S. >>> saldroubi@yahoo.com >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> >> ========================================================= >> Haibo Zhang, PhD >> Computational Biology >> http://www.cyberpostdoc.org/ >> Share postdoc information in cyberspace. Welcome your stories, suggestions >> and >> advice! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> > From sdavis2 at mail.nih.gov Fri Jan 6 06:54:00 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri Jan 6 07:11:49 2006 Subject: [Bioperl-l] How to extract promoter region seq from genbank or another source? In-Reply-To: <43BE24BE.7020502@ebi.ac.uk> Message-ID: One way to do this is to map your genbank accession to Unigene and then map that to Entrez Gene. The simplest way to do this is to use Stanford Source ( http://smd.stanford.edu/cgi-bin/source/sourceBatchSearch). Just past in your list of accessions, choose to map from Genbank Accession, and then choose what you want to map to (LocusLink ID, in this case). From there, you can use the LocusLink IDs (now called Entrez Gene ID) to search in Ensembl or TRASER for upstream sequences. The other alternative is to use the UCSC genome table browser directly ( http://genome.ucsc.edu/cgi-bin/hgTables). Choose your organism of interest. Choose group "mRNA and EST tracks", table "all_mrna" and region "genome". Click "Paste List" and paste in your list of accessions and submit it. Then choose output format "sequence" and "plain text" output. Then choose "get output". On the next page, you can see options for output. Choose what you like. Hope this helps and keeps things simple. On 1/6/06 3:05 AM, "Ewan Birney" wrote: > > > hz5@njit.edu wrote: >> http://siriusb.umdnj.edu:18080/EZRetrieve/index.jsp >> >> Quoting Stefan Kirov : >> >>> Sam, >>> You can use MART to convert to ensembl id (in most cases). I don't think >>> >>> they support genebank. You can try to use genekeydb >>> (genereg.ornl.gov/gkdb), either download it or use the online converter, >>> > > > I know this is rather old, but at Ensembl we do, of course, track GenBank > accession numbers - these are the identifiers shared with EMBL. We don't > track GenBank gi numbers as they are too volatile. > > >>> but my guess is you are not going to get too many ids. One thing I may >>> >>> fix in the future, but right now... Still may be worth a try. Look at >>> seqhound too (http://www.blueprint.org/seqhound/index.html). >>> Stefan >>> >>> Brian Osborne wrote: >>> >>>> ENSEMBL experts? >>>> >>>> ------ Forwarded Message >>>> From: Sam Al-Droubi >>>> Date: Fri, 14 Oct 2005 14:05:38 -0700 (PDT) >>>> To: Brian Osborne >>>> Subject: Re: [Bioperl-l] How to extract promoter region seq from >>> genbank or >>>> another source? >>>> >>>> Hi Brian, >>>> >>>> Thank you for the response. I looked at it but it seems that enembl >>> does >>>> not use accession numbers. It seems that they have their own >>> numbering >>>> scheme. If so how do I get the mapping between the two. If I can't >>> get the >>>> promoter region sequence then do you know if there is a way I can get >>> the >>>> entire chromosome sequence? If so, I can then try to find the gene >>> within >>>> it and then grab the promoter region. >>>> I am new to all this so I am sorry if I sound ignorant in this area. >>>> >>>> On the surface, it seems that one should be able to do this easily but >>> it >>>> has not been easy so far. >>>> >>>> Thank you. >>>> >>>> >>>> Brian Osborne wrote: >>>> >>>> >>>>> Sam, >>>>> >>>>> ensembl may be one solution, I think it provides a good API for these >>> sorts >>>>> of queries. See the ensembl API documentation for more information >>>>> (http://www.ensembl.org/info/software/core/core_tutorial.html). >>>>> >>>>> Brian O. >>>>> >>>>> >>>>> >>>>> On 10/13/05 11:25 AM, "Sam Al-Droubi" wrote: >>>>> >>>>> >>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I am totally new to BioPerl. I was able to install it and retrieve >>> data >>>>>>> >>>>>>> >>>>>> from >>>>>> >>>>>> >>>>>>> GenBank. I have a list of accession numbers for genes but I want to >>> use >>>>>>> BioPerl to get the promoter region (1000 bp before the start of the >>> gene). >>>>>>> Can someone point me in the right direction on how to accomplish >>> this. >>>>>>> Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine. >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Sincerely, >>>>>>> Sam Al-Droubi, M.S. >>>>>>> saldroubi@yahoo.com >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@portal.open-bio.org >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>> >>>>> >>>> >>>> Sincerely, >>>> Sam Al-Droubi, M.S. >>>> saldroubi@yahoo.com >>>> >>>> ------ End of Forwarded Message >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> -- >>> Stefan Kirov, Ph.D. >>> University of Tennessee/Oak Ridge National Laboratory >>> 5700 bldg, PO BOX 2008 MS6164 >>> Oak Ridge TN 37831-6164 >>> USA >>> tel +865 576 5120 >>> fax +865-576-5332 >>> e-mail: skirov@utk.edu >>> sao@ornl.gov >>> >>> "And the wars go on with brainwashed pride >>> For the love of God and our human rights >>> And all these things are swept aside" >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> ========================================================= >> Haibo Zhang, PhD >> Computational Biology >> http://www.cyberpostdoc.org/ >> Share postdoc information in cyberspace. Welcome your stories, suggestions >> and >> advice! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From saldroubi at yahoo.com Fri Jan 6 10:57:08 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Fri Jan 6 11:00:52 2006 Subject: FW: [Bioperl-l] How to extract promoter region seq from genbank or another source? In-Reply-To: <1136525256.43bdffc8cf8e0@webmail.njit.edu> Message-ID: <20060106155708.43569.qmail@web34305.mail.mud.yahoo.com> All, I have found this website http://biowulf.bu.edu/zlab/PromoSer/ that given list of accession numbers, will give you back the promoter regions. It works very well. I believe I posted this on the email list after I found it. hz5@njit.edu wrote: http://siriusb.umdnj.edu:18080/EZRetrieve/index.jsp Quoting Stefan Kirov : > Sam, > You can use MART to convert to ensembl id (in most cases). I don't think > > they support genebank. You can try to use genekeydb > (genereg.ornl.gov/gkdb), either download it or use the online converter, > > but my guess is you are not going to get too many ids. One thing I may > > fix in the future, but right now... Still may be worth a try. Look at > seqhound too (http://www.blueprint.org/seqhound/index.html). > Stefan > > Brian Osborne wrote: > > >ENSEMBL experts? > > > >------ Forwarded Message > >From: Sam Al-Droubi > >Date: Fri, 14 Oct 2005 14:05:38 -0700 (PDT) > >To: Brian Osborne > >Subject: Re: [Bioperl-l] How to extract promoter region seq from > genbank or > >another source? > > > >Hi Brian, > > > >Thank you for the response. I looked at it but it seems that enembl > does > >not use accession numbers. It seems that they have their own > numbering > >scheme. If so how do I get the mapping between the two. If I can't > get the > >promoter region sequence then do you know if there is a way I can get > the > >entire chromosome sequence? If so, I can then try to find the gene > within > >it and then grab the promoter region. > >I am new to all this so I am sorry if I sound ignorant in this area. > > > >On the surface, it seems that one should be able to do this easily but > it > >has not been easy so far. > > > >Thank you. > > > > > >Brian Osborne wrote: > > > > > >>Sam, > >> > >>ensembl may be one solution, I think it provides a good API for these > sorts > >>of queries. See the ensembl API documentation for more information > >>(http://www.ensembl.org/info/software/core/core_tutorial.html). > >> > >>Brian O. > >> > >> > >> > >>On 10/13/05 11:25 AM, "Sam Al-Droubi" wrote: > >> > >> > >> > >>>>Hello, > >>>> > >>>>I am totally new to BioPerl. I was able to install it and retrieve > data > >>>> > >>>> > >>>from > >>> > >>> > >>>>GenBank. I have a list of accession numbers for genes but I want to > use > >>>>BioPerl to get the promoter region (1000 bp before the start of the > gene). > >>>>Can someone point me in the right direction on how to accomplish > this. > >>>> > >>>>Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine. > >>>> > >>>>Thank you. > >>>> > >>>> > >>>> > >>>> > >>>>Sincerely, > >>>>Sam Al-Droubi, M.S. > >>>>saldroubi@yahoo.com > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l@portal.open-bio.org > >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >> > >> > > > > > >Sincerely, > >Sam Al-Droubi, M.S. > >saldroubi@yahoo.com > > > >------ End of Forwarded Message > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > Stefan Kirov, Ph.D. > University of Tennessee/Oak Ridge National Laboratory > 5700 bldg, PO BOX 2008 MS6164 > Oak Ridge TN 37831-6164 > USA > tel +865 576 5120 > fax +865-576-5332 > e-mail: skirov@utk.edu > sao@ornl.gov > > "And the wars go on with brainwashed pride > For the love of God and our human rights > And all these things are swept aside" > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > ========================================================= Haibo Zhang, PhD Computational Biology http://www.cyberpostdoc.org/ Share postdoc information in cyberspace. Welcome your stories, suggestions and advice! _______________________________________________ Bioperl-l mailing list Bioperl-l@portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/bioperl-l Sincerely, Sam Al-Droubi, M.S. saldroubi@yahoo.com From skirov at utk.edu Fri Jan 6 11:56:52 2006 From: skirov at utk.edu (Stefan Kirov) Date: Fri Jan 6 11:53:44 2006 Subject: FW: [Bioperl-l] How to extract promoter region seq from genbank or another source? In-Reply-To: <43BE24BE.7020502@ebi.ac.uk> References: <43506D94.8030909@utk.edu> <1136525256.43bdffc8cf8e0@webmail.njit.edu> <43BE24BE.7020502@ebi.ac.uk> Message-ID: <43BEA154.8030707@utk.edu> My bad Ewan. I was looking for Genbank in external_db, did not see it and assumed you don't keep Genbank... Sorry. Stefan Ewan Birney wrote: > > > hz5@njit.edu wrote: > >> http://siriusb.umdnj.edu:18080/EZRetrieve/index.jsp >> >> Quoting Stefan Kirov : >> >>> Sam, >>> You can use MART to convert to ensembl id (in most cases). I don't >>> think >>> >>> they support genebank. You can try to use genekeydb >>> (genereg.ornl.gov/gkdb), either download it or use the online >>> converter, >>> > > > I know this is rather old, but at Ensembl we do, of course, track GenBank > accession numbers - these are the identifiers shared with EMBL. We don't > track GenBank gi numbers as they are too volatile. > > >>> but my guess is you are not going to get too many ids. One thing I may >>> >>> fix in the future, but right now... Still may be worth a try. Look >>> at seqhound too (http://www.blueprint.org/seqhound/index.html). >>> Stefan >>> >>> Brian Osborne wrote: >>> >>>> ENSEMBL experts? >>>> >>>> ------ Forwarded Message >>>> From: Sam Al-Droubi >>>> Date: Fri, 14 Oct 2005 14:05:38 -0700 (PDT) >>>> To: Brian Osborne >>>> Subject: Re: [Bioperl-l] How to extract promoter region seq from >>> >>> genbank or >>> >>>> another source? >>>> >>>> Hi Brian, >>>> >>>> Thank you for the response. I looked at it but it seems that enembl >>> >>> does >>> >>>> not use accession numbers. It seems that they have their own >>> >>> numbering >>> >>>> scheme. If so how do I get the mapping between the two. If I can't >>> >>> get the >>> >>>> promoter region sequence then do you know if there is a way I can get >>> >>> the >>> >>>> entire chromosome sequence? If so, I can then try to find the gene >>> >>> within >>> >>>> it and then grab the promoter region. >>>> I am new to all this so I am sorry if I sound ignorant in this area. >>>> >>>> On the surface, it seems that one should be able to do this easily but >>> >>> it >>> >>>> has not been easy so far. >>>> >>>> Thank you. >>>> >>>> Brian Osborne wrote: >>>> >>>> >>>>> Sam, >>>>> >>>>> ensembl may be one solution, I think it provides a good API for these >>>> >>> sorts >>> >>>>> of queries. See the ensembl API documentation for more information >>>>> (http://www.ensembl.org/info/software/core/core_tutorial.html). >>>>> >>>>> Brian O. >>>>> >>>>> >>>>> >>>>> On 10/13/05 11:25 AM, "Sam Al-Droubi" wrote: >>>>> >>>>> >>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I am totally new to BioPerl. I was able to install it and retrieve >>>>>> >>> data >>> >>>>>>> >>>>>> >>>>>> from >>>>>> >>>>>> >>>>>>> GenBank. I have a list of accession numbers for genes but I want to >>>>>> >>> use >>> >>>>>>> BioPerl to get the promoter region (1000 bp before the start of the >>>>>> >>> gene). >>> >>>>>>> Can someone point me in the right direction on how to accomplish >>>>>> >>> this. >>> >>>>>>> Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine. >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Sincerely, Sam Al-Droubi, M.S. >>>>>>> saldroubi@yahoo.com >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@portal.open-bio.org >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> Sincerely, Sam Al-Droubi, M.S. >>>> saldroubi@yahoo.com >>>> >>>> ------ End of Forwarded Message >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> -- >>> Stefan Kirov, Ph.D. >>> University of Tennessee/Oak Ridge National Laboratory >>> 5700 bldg, PO BOX 2008 MS6164 >>> Oak Ridge TN 37831-6164 >>> USA >>> tel +865 576 5120 >>> fax +865-576-5332 >>> e-mail: skirov@utk.edu >>> sao@ornl.gov >>> >>> "And the wars go on with brainwashed pride >>> For the love of God and our human rights >>> And all these things are swept aside" >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> >> ========================================================= >> Haibo Zhang, PhD >> Computational Biology >> http://www.cyberpostdoc.org/ >> Share postdoc information in cyberspace. Welcome your stories, >> suggestions and advice! >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > -- From kevin.mcmahon at ttuhsc.edu Fri Jan 6 12:13:17 2006 From: kevin.mcmahon at ttuhsc.edu (kevin.mcmahon@ttuhsc.edu) Date: Fri Jan 6 12:24:57 2006 Subject: [Bioperl-l] matrix from bl2seq Message-ID: <6846AE140C394A4E9F91E177190F8752048745D5@alamo.ttuhsc.edu> Giovanni, I am new to Bioperl myself, so take my advice with a grain of salt. But, it sounds like you've got the right idea. If you're looking to make one of those charts that shows the percent identity between two sequences for every permutation, I'd do exactly what you're doing. And use the frac_identical or frac_conserved (depending on what you want) methods from the Bio::Search::Hit module. For real information on how to do this, check out Jason Stajich's excellent review on the Bio::SearchIO module here: http://bioperl.org/HOWTOs/html/SearchIO.html . It sure helped me! Good luck, Wyatt From cjfields at uiuc.edu Fri Jan 6 12:20:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Jan 6 12:33:41 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl Message-ID: <000f01c612e5$70e9a950$15327e82@pyrimidine> Hilmar, Did this ever get resolved? I tried to reinstall a biosql database using bioperl-db and got the same problems. I'll list out everything I ran into and what I pan on trying, as it's been a long time since I've tried this. Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL 4.1.14. Using nmake and installing worked fine. Loading the biosql schema and loading taxonomy info also worked fine, although I had to manually untar the taxonomy archive so load_ncbi_taxonomy.pl could find the files (stupid windows). However, this is what happens when using load_seqdatabase.pl: C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser root NP_249092.gpt Loading NP_249092.gpt ... Undefined subroutine &Bio::Root::Root::debug called at C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, line 65. If I removed all args except the sequence file, it gives the same response, which means it happens before the connection is made to the database: C:\Perl\Scripts>load_seqdatabase.pl NP_249092.gpt Loading NP_249092.gpt ... Undefined subroutine &Bio::Root::Root::debug called at C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, line 65. Going back through the test suite and setting up DBHarness.biosql.conf using my loaded database gives a TON of errors, mostly warnings (which I'll show in a second). In the DBHarness.biosql.conf file, setting the database to 'biosql' w/o giving a password gives the following summary: Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 3-160 Failed 158/160 tests, 1.25% okay Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------------- --- t\01dbadaptor.t 255 65280 19 30 157.89% 5-19 <------ t\02species.t 255 65280 65 126 193.85% 3-65 t\03simpleseq.t 255 65280 59 108 183.05% 6-59 <------ t\04swiss.t 255 65280 52 100 192.31% 3-52 t\05seqfeature.t 255 65280 48 92 191.67% 3-48 t\06comment.t 255 65280 11 18 163.64% 3-11 t\07dblink.t 255 65280 18 32 177.78% 3-18 t\08genbank.t 255 65280 18 32 177.78% 3-18 t\09fuzzy2.t 255 65280 21 38 180.95% 3-21 t\10ensembl.t 255 65280 15 26 173.33% 3-15 t\11locuslink.t 255 65280 110 214 194.55% 4-110 t\12ontology.t 2 512 738 1471 199.32% 3-738 t\13remove.t 255 65280 59 116 196.61% 2-59 t\15cluster.t 255 65280 160 316 197.50% 3-160 Failed 14/15 test scripts, 6.67% okay. 1360/1411 subtests failed, 3.61% okay. NMAKE : U1077: Stop. ___________________________________________________________________________ While setting the password (allowing the database to connect) gives: Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------------- --- t\01dbadaptor.t 255 65280 19 10 52.63% 15-19 <------ t\02species.t 255 65280 65 126 193.85% 3-65 t\03simpleseq.t 255 65280 59 96 162.71% 12-59 <------ t\04swiss.t 255 65280 52 100 192.31% 3-52 t\05seqfeature.t 255 65280 48 92 191.67% 3-48 t\06comment.t 255 65280 11 18 163.64% 3-11 t\07dblink.t 255 65280 18 32 177.78% 3-18 t\08genbank.t 255 65280 18 32 177.78% 3-18 t\09fuzzy2.t 255 65280 21 38 180.95% 3-21 t\10ensembl.t 255 65280 15 26 173.33% 3-15 t\11locuslink.t 255 65280 110 214 194.55% 4-110 t\12ontology.t 2 512 738 1471 199.32% 3-738 t\13remove.t 255 65280 59 116 196.61% 2-59 t\15cluster.t 255 65280 160 316 197.50% 3-160 Failed 14/15 test scripts, 6.67% okay. 1344/1411 subtests failed, 4.75% okay. NMAKE : U1077: Stop. ___________________________________________________________________________ The arrows point to the difference in connectivity to the database. Angshu had not set his password: > :) I'm sorry again. > After setting the DBHarness I get the following from nmake test: > > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 3-160 > Failed 158/160 tests, 1.25% okay > Failed Test Stat Wstat Total Fail Failed List of Failed > ----------------------------------------------------------------------- > ------ > t\01dbadaptor.t 5 1280 19 30 157.89% 5-19 <------ > t\02species.t 255 65280 65 126 193.85% 3-65 > t\03simpleseq.t 5 1280 59 108 183.05% 6-59 <------ > t\04swiss.t 255 65280 52 100 192.31% 3-52 > t\05seqfeature.t 255 65280 48 92 191.67% 3-48 > t\06comment.t 255 65280 11 18 163.64% 3-11 > t\07dblink.t 255 65280 18 32 177.78% 3-18 > t\08genbank.t 255 65280 18 32 177.78% 3-18 > t\09fuzzy2.t 255 65280 21 38 180.95% 3-21 > t\10ensembl.t 255 65280 15 26 173.33% 3-15 > t\11locuslink.t 255 65280 110 214 194.55% 4-110 > t\12ontology.t 9 2304 738 1470 199.19% 4-738 > t\13remove.t 255 65280 59 116 196.61% 2-59 > t\15cluster.t 255 65280 160 316 197.50% 3-160 > Failed 14/15 test scripts, 6.67% okay. 1359/1411 subtests failed, > 3.69% okay. Here's the error messages from that first test (warning it's very messy): C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, 'bl ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t t\08genbank.t t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t t\13remove.t t\14query.t t\15cluster.t t\01dbadaptor.....ok 1/19Subroutine new redefined at C:\Perl\test\bioperl-db\bli b\lib/Bio\DB\BioSQL\DBAdaptor.pm line 67. Subroutine get_object_adaptor redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\DB\BioSQL\DBAdaptor.pm line 112. Subroutine _get_object_adaptor_class redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\DB\BioSQL\DBAdaptor.pm line 154. Subroutine set_object_adaptor redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\DB\BioSQL\DBAdaptor.pm line 231. Subroutine create_persistent redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\DB\BioSQL\DBAdaptor.pm line 258. Subroutine dbcontext redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\DB\BioSQL\DBAdaptor.pm line 303. Subroutine _load_object_adaptor redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\DB\BioSQL\DBAdaptor.pm line 323. Subroutine new redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\BioEntry.pm line 89. Subroutine object_id redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\BioEntry.pm line 129. Subroutine version redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\BioEntry.pm line 151. Subroutine authority redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\BioEntry.pm line 172. Subroutine namespace redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\BioEntry.pm line 193. Subroutine display_name redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\BioEntry.pm line 218. Subroutine description redefined at C:\Perl\test\bioperl-db\blib\lib/Bio\BioEntry.pm line 242. Subroutine new redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 181. Subroutine verbose redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 214. Subroutine _register_for_cleanup redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 226. Subroutine _unregister_for_cleanup redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 236. Subroutine _cleanup_methods redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 243. "my" variable $class masks earlier declaration in same scope at C:/Perl/site/lib/Bio\Root\Root.pm line 315. Subroutine throw redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 293. Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 356. Subroutine _load_module redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 377. Subroutine DESTROY redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 405. (in cleanup) Undefined subroutine &Bio::Root::Root::DESTROY called at C:\Perl\test\bioperl-db\blib\lib/Bio\DB\BioSQL\DBAdaptor.pm line 177. Undefined subroutine &Bio::Root::Root::debug called at C:\Perl\test\bioperl-db\blib\lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 688. (in cleanup) Undefined subroutine &Bio::Root::Root::DESTROY called. (in cleanup) Undefined subroutine &Bio::Root::Root::DESTROY called during global destruction. (in cleanup) Undefined subroutine &Bio::Root::Root::DESTROY called at C:\Perl\test\bioperl-db\blib\lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1860 during global destruction. (in cleanup) Undefined subroutine &Bio::Root::Root::DESTROY called during global destruction. (in cleanup) Undefined subroutine &Bio::Root::Root::DESTROY called at C:\Perl\test\bioperl-db\blib\lib/Bio/DB/DBI/base.pm line 417 during global destruction. (in cleanup) Undefined subroutine &Bio::Root::Root::DESTROY called during global destruction. (in cleanup) Undefined subroutine &Bio::Root::Root::DESTROY called at C:\Perl\test\bioperl-db\blib\lib/Bio/DB/DBI/base.pm line 417 during global destruction. t\01dbadaptor.....dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 15-19 Failed 5/19 tests, 73.68% okay ____________________________________________________________________________ I'll end with that. At this moment, I can't see it working with the current setup. I was using perl 5.8 with the old setup but I upgraded mysql at some point when working with gbrowse (I can't remember what the old version was); I'll try upgrading to the newest ActiveState version to see what happens. Could it be the MySQL version? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Fri Jan 6 13:01:38 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jan 6 12:58:19 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: <000f01c612e5$70e9a950$15327e82@pyrimidine> References: <000f01c612e5$70e9a950$15327e82@pyrimidine> Message-ID: <6d78708a0f96357d9ab6cbe4804a7c15@gmx.net> On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > Hilmar, > > Did this ever get resolved? I tried to reinstall a biosql database > using > bioperl-db and got the same problems. I'll list out everything I ran > into > and what I pan on trying, as it's been a long time since I've tried > this. > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL > 4.1.14. > Using nmake and installing worked fine. Loading the biosql schema and > loading taxonomy info also worked fine, although I had to manually > untar the > taxonomy archive so load_ncbi_taxonomy.pl could find the files (stupid > windows). However, this is what happens when using > load_seqdatabase.pl: > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser root > NP_249092.gpt > Loading NP_249092.gpt ... > Undefined subroutine &Bio::Root::Root::debug called at > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, > > line 65. > > If I removed all args except the sequence file, it gives the same > response, > which means it happens before the connection is made to the database: > This happens indeed before a connection is made because it happens at the point it tries to dynamically load the BioSQL driver for the adaptor: $self->debug("attempting to load driver for adaptor class $class\n"); The BioSQL driver is loaded before the DBD driver is loaded. The module in which this happens (i.e., the persistence adaptor) has been loaded dynamically as well. Bio::Root::Root is in the 'use' statements, and the debug() method clearly exists. I'm at a loss as to why perl complains on certain Windows platforms. If somebody can tell me what, if anything, can be done to make this work on those platforms too I'll be glad to implement it. > [...] > Here's the error messages from that first test (warning it's very > messy): > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, > 'bl > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t t\08genbank.t > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t t\13remove.t > t\14query.t t\15cluster.t > t\01dbadaptor.....ok 1/19Subroutine new redefined at > [...] > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line > 356. So obviously it is there, right? So why doesn't perl see it a minute later? > [...] > I'll end with that. At this moment, I can't see it working with the > current > setup. I was using perl 5.8 with the old setup but I upgraded mysql > at some > point when working with gbrowse (I can't remember what the old version > was); > I'll try upgrading to the newest ActiveState version to see what > happens. > Could it be the MySQL version? I don't think it has anything to do with the MySQL version, or the DBD driver for that matter. Instead, it looks like on issue with dynamic loading of perl modules on your particular platform. -hilmar > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From christoph.gille at charite.de Fri Jan 6 14:18:49 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Fri Jan 6 14:21:43 2006 Subject: [Bioperl-l] joining BioPerl and Java Message-ID: <33239.192.168.220.204.1136575129.squirrel@webmail.charite.de> I am planing to bring BioPerl and BioJava/STRAP together. Motivation: I think that this might be useful because BioPerl allows to write Bioinformatics programs in a very comfortable way and BioJava and STRAP provide visualization tools for sequences, alignments and structures. Further Java is fast enough for computational intensive calculations. As a positive side effect STRAP users may benefit from additional modules. Implementation: The plan is to provide a Java wrapper class for BioPerl programs.This wrapper class should be as simple as possible so that even PERL programmers without Java experience are able to embed their PERL program. Initially, this class should manage installation of PERL, Cygwin (DOS computers) and BioPerl. At first invokation it will download the respective PERL program as a zip archive. It will create a working directory and will start the PERL program. After the PERL program has terminated it will parse and evaluate the output and perform the required action. Communication between the PERL and Java layer will be possible via stdin/stdout streams and with local files. From cjfields at uiuc.edu Fri Jan 6 14:28:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Jan 6 14:35:05 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: <6d78708a0f96357d9ab6cbe4804a7c15@gmx.net> Message-ID: <000001c612f7$5f1a77b0$15327e82@pyrimidine> I'll try installing bioperl-db using Cygwin. I know that I can connect to the native Windows mysql database from inside cygwin, so perhaps this will do as a short term workaround. I'll also try using a different native win32 Perl version (maybe 5.6) and look into the dynamic loading issue. I know that the AS Perl has given errors like this before and not had problems (I think it was also cranky with older versions bioperl), but this one is pretty serious. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign -----Original Message----- From: Hilmar Lapp [mailto:hlapp@gmx.net] Sent: Friday, January 06, 2006 12:02 PM To: Chris Fields Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] error running load_seqdatabase.pl On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > Hilmar, > > Did this ever get resolved? I tried to reinstall a biosql database > using > bioperl-db and got the same problems. I'll list out everything I ran > into > and what I pan on trying, as it's been a long time since I've tried > this. > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL > 4.1.14. > Using nmake and installing worked fine. Loading the biosql schema and > loading taxonomy info also worked fine, although I had to manually > untar the > taxonomy archive so load_ncbi_taxonomy.pl could find the files (stupid > windows). However, this is what happens when using > load_seqdatabase.pl: > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser root > NP_249092.gpt > Loading NP_249092.gpt ... > Undefined subroutine &Bio::Root::Root::debug called at > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, > > line 65. > > If I removed all args except the sequence file, it gives the same > response, > which means it happens before the connection is made to the database: > This happens indeed before a connection is made because it happens at the point it tries to dynamically load the BioSQL driver for the adaptor: $self->debug("attempting to load driver for adaptor class $class\n"); The BioSQL driver is loaded before the DBD driver is loaded. The module in which this happens (i.e., the persistence adaptor) has been loaded dynamically as well. Bio::Root::Root is in the 'use' statements, and the debug() method clearly exists. I'm at a loss as to why perl complains on certain Windows platforms. If somebody can tell me what, if anything, can be done to make this work on those platforms too I'll be glad to implement it. > [...] > Here's the error messages from that first test (warning it's very > messy): > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, > 'bl > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t t\08genbank.t > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t t\13remove.t > t\14query.t t\15cluster.t > t\01dbadaptor.....ok 1/19Subroutine new redefined at > [...] > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line > 356. So obviously it is there, right? So why doesn't perl see it a minute later? > [...] > I'll end with that. At this moment, I can't see it working with the > current > setup. I was using perl 5.8 with the old setup but I upgraded mysql > at some > point when working with gbrowse (I can't remember what the old version > was); > I'll try upgrading to the newest ActiveState version to see what > happens. > Could it be the MySQL version? I don't think it has anything to do with the MySQL version, or the DBD driver for that matter. Instead, it looks like on issue with dynamic loading of perl modules on your particular platform. -hilmar > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From gcoppola at ucla.edu Fri Jan 6 12:58:24 2006 From: gcoppola at ucla.edu (Giovanni Coppola) Date: Fri Jan 6 14:46:42 2006 Subject: [Bioperl-l] matrix from bl2seq In-Reply-To: <6846AE140C394A4E9F91E177190F8752048745D5@alamo.ttuhsc.edu> References: <6846AE140C394A4E9F91E177190F8752048745D5@alamo.ttuhsc.edu> Message-ID: Hi Wyatt, thanks for your suggestion. I was considering ClustalW and LALIGN and SSEARCH from FASTA as well, since I am looking for perfect local alignments (I am designing microarray probes and I am checking for homology regions of 25 bp). I will end up trying several of them and choosing based on the output.... anyway, it is a good exercise! Thanks again Giovanni On Jan 6, 2006, at 9:13 AM, kevin.mcmahon@ttuhsc.edu wrote: > Giovanni, > > I am new to Bioperl myself, so take my advice with a grain of > salt. But, it > sounds like you've got the right idea. If you're looking to make > one of > those charts that shows the percent identity between two sequences > for every > permutation, I'd do exactly what you're doing. And use the > frac_identical > or frac_conserved (depending on what you want) methods from the > Bio::Search::Hit module. For real information on how to do this, > check out > Jason Stajich's excellent review on the Bio::SearchIO module here: > http://bioperl.org/HOWTOs/html/SearchIO.html > . It sure helped me! > > Good luck, > > Wyatt > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From tobias.straub at lmu.de Fri Jan 6 11:06:04 2006 From: tobias.straub at lmu.de (Tobias Straub) Date: Sat Jan 7 11:17:24 2006 Subject: [Bioperl-l] GFF3 aggregation bottom-up Message-ID: Lincoln, (and whoever might be expert) On my way to get the gadfly gff3 files displayed properly I was trying to rebuild the canonical gene (EDEN) that you describe in your GFF3 summary on song.sourceforge.net. Using bioperl 1.5.1 (http://bioperl.org/DIST/current_core_unstable.tar.gz) and gbrowse I can neither get just 3 properly processed_transcript features nor the proper names (i.e. EDEN.1, EDEN.2, EDEN.3) for those when using the following GFF3 file (as in-memory or mysql database). ## ##gff-version 3 ##sequence-region ctg123 1 1497228 ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001 ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1 ctg123 . mRNA 1050 9000 . + . ID=mRNA00002;Parent=gene00001;Name=EDEN.2 ctg123 . mRNA 1300 9000 . + . ID=mRNA00003;Parent=gene00001;Name=EDEN.3 ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mRNA00003 ctg123 . exon 1050 1500 . + . ID=exon00002;Parent=mRNA00001,mRNA00002 ctg123 . exon 3000 3902 . + . ID=exon00003;Parent=mRNA00001,mRNA00003 ctg123 . exon 5000 5500 . + . ID=exon00004; Parent=mRNA00001,mRNA00002,mRNA00003 ctg123 . exon 7000 9000 . + . ID=exon00005; Parent=mRNA00001,mRNA00002,mRNA00003 ctg123 . CDS 1201 1500 . + 0 ID=cds000001;Parent=mRNA0001; Name=edenprotein.1 ctg123 . CDS 3000 3902 . + 0 ID=cds000001;Parent=mRNA0001; Name=edenprotein.1 ctg123 . CDS 5000 5500 . + 0 ID=cds000001;Parent=mRNA0001; Name=edenprotein.1 ctg123 . CDS 7000 7600 . + 0 ID=cds000001;Parent=mRNA0001; Name=edenprotein.1 ctg123 . CDS 1201 1500 . + 0 ID=cds000002;Parent=mRNA0002; Name=edenprotein.2 ctg123 . CDS 5000 5500 . + 0 ID=cds000002;Parent=mRNA0002; Name=edenprotein.2 ctg123 . CDS 7000 7600 . + 0 ID=cds000002;Parent=mRNA0002; Name=edenprotein.2 ctg123 . CDS 3301 3902 . + 0 ID=cds00003;Parent=mRNA0003; Name=edenprotein.3 ctg123 . CDS 5000 5500 . + 1 ID=cds00003;Parent=mRNA0003; Name=edenprotein.3 ctg123 . CDS 7000 7600 . + 1 ID=cds00003;Parent=mRNA0003; Name=edenprotein.3 ctg123 . CDS 3391 3902 . + 0 ID=cds00004;Parent=mRNA0003; Name=edenprotein.4 ctg123 . CDS 5000 5500 . + 1 ID=cds00004;Parent=mRNA0003; Name=edenprotein.4 ctg123 . CDS 7000 7600 . + 1 ID=cds00004;Parent=mRNA0003; Name=edenprotein.4 ## please find attached the gbrowse out put when using processed_trancript aggregator and processed trancsript glyph. Is that behaviour expected? -------------- next part -------------- A non-text attachment was scrubbed... Name: pastedGraphic1.tiff Type: image/tiff Size: 22424 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060106/bf27098f/pastedGraphic1-0001.tiff -------------- next part -------------- best Tobias ====================================================================== Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany ====================================================================== From tobias.straub at lmu.de Sat Jan 7 07:38:55 2006 From: tobias.straub at lmu.de (Tobias Straub) Date: Sat Jan 7 11:17:26 2006 Subject: [Bioperl-l] Re: [Gmod-gbrowse] GFF3 aggregation bottom-up In-Reply-To: <1136566358.10751.39.camel@localhost.localdomain> References: <1136566358.10751.39.camel@localhost.localdomain> Message-ID: <1da245774596a76a44aa76cd51d403f8@lmu.de> Scott, here you go, that's my conf file with the attribute('Name') labelling: ----------------- snip [GENERAL] description = GFF3 test (memory) db_adaptor = Bio::DB::GFF db_args = -adaptor memory -gff '/var/apache2/htdocs/gbrowse/databases/eden.gff' stylesheet = /gbrowse/gbrowse.css buttons = /gbrowse/images/buttons tmpimages = /gbrowse/tmp help = /gbrowse/ aggregators = processed_transcript default features = Gene image widths = 450 640 800 1024 default width = 800 stylesheet = /gbrowse/gbrowse.css buttons = /gbrowse/images/buttons tmpimages = /gbrowse/tmp max segment = 50000 default segment = 5000 zoom levels = 100 200 1000 2000 5000 10000 20000 40000 50000 overview bgcolor = lightgrey detailed bgcolor = lightgoldenrodyellow key bgcolor = beige [TRACK DEFAULTS] glyph = generic height = 10 bgcolor = lightgrey fgcolor = black font2color = blue label density = 25 bump density = 100 link = AUTO [Gene] feature = gene glyph = transcript2 bgcolor = lightblue label = sub { my $f = shift; my @n = ($f->attributes('Name'), $f->attributes('Alias')); return $n[0]; } key = gene [Transcripts] feature = processed_transcript glyph = processed_transcript bgcolor = peachpuff label = sub { my $f = shift; my @n = ($f->attributes('Name')); return $n[0]; } description = 1 key = Gene model -----------snip ------------- the result is attached, -------------- next part -------------- A non-text attachment was scrubbed... Name: pastedGraphic1.tiff Type: image/tiff Size: 21412 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060107/6cc479cb/pastedGraphic1-0001.tiff -------------- next part -------------- Tobias On 6 Jan 2006, at 17:52, Scott Cain wrote: > Tobias, > > I missed your second comment with regard to what look like 'extra' > features in your display. Please send the track definition section > that > you are using so I can take a look. > > Scott > > > On Fri, 2006-01-06 at 15:06 +0100, Tobias Straub wrote: >> Lincoln, >> >> On my way to get the gadfly gff3 files displayed properly I was trying >> to rebuild the canonical gene (EDEN) that you describe in your GFF3 >> summary on song.sourceforge.net. >> Using bioperl 1.5.1 >> (http://bioperl.org/DIST/current_core_unstable.tar.gz) and gbrowse I >> can neither get just 3 properly processed_transcript features nor the >> proper names (i.e. EDEN.1, EDEN.2, EDEN.3) for those when using the >> following GFF3 file (as in-memory or mysql database). >> >> ## >> ##gff-version 3 >> ##sequence-region ctg123 1 1497228 >> ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN >> ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001 >> ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001; >> Name=EDEN.1 >> ctg123 . mRNA 1050 9000 . + . ID=mRNA00002;Parent=gene00001; >> Name=EDEN.2 >> ctg123 . mRNA 1300 9000 . + . ID=mRNA00003;Parent=gene00001; >> Name=EDEN.3 >> ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mRNA00003 >> ctg123 . exon 1050 1500 . + . ID=exon00002;Parent=mRNA00001,mRNA00002 >> ctg123 . exon 3000 3902 . + . ID=exon00003;Parent=mRNA00001,mRNA00003 >> ctg123 . exon 5000 5500 . + . ID=exon00004; >> Parent=mRNA00001,mRNA00002,mRNA00003 >> ctg123 . exon 7000 9000 . + . ID=exon00005; >> Parent=mRNA00001,mRNA00002,mRNA00003 >> ctg123 . CDS 1201 1500 . + 0 ID=cds000001;Parent=mRNA0001; >> Name=edenprotein.1 >> ctg123 . CDS 3000 3902 . + 0 ID=cds000001;Parent=mRNA0001; >> Name=edenprotein.1 >> ctg123 . CDS 5000 5500 . + 0 ID=cds000001;Parent=mRNA0001; >> Name=edenprotein.1 >> ctg123 . CDS 7000 7600 . + 0 ID=cds000001;Parent=mRNA0001; >> Name=edenprotein.1 >> ctg123 . CDS 1201 1500 . + 0 ID=cds000002;Parent=mRNA0002; >> Name=edenprotein.2 >> ctg123 . CDS 5000 5500 . + 0 ID=cds000002;Parent=mRNA0002; >> Name=edenprotein.2 >> ctg123 . CDS 7000 7600 . + 0 ID=cds000002;Parent=mRNA0002; >> Name=edenprotein.2 >> ctg123 . CDS 3301 3902 . + 0 ID=cds00003;Parent=mRNA0003; >> Name=edenprotein.3 >> ctg123 . CDS 5000 5500 . + 1 ID=cds00003;Parent=mRNA0003; >> Name=edenprotein.3 >> ctg123 . CDS 7000 7600 . + 1 ID=cds00003;Parent=mRNA0003; >> Name=edenprotein.3 >> ctg123 . CDS 3391 3902 . + 0 ID=cds00004;Parent=mRNA0003; >> Name=edenprotein.4 >> ctg123 . CDS 5000 5500 . + 1 ID=cds00004;Parent=mRNA0003; >> Name=edenprotein.4 >> ctg123 . CDS 7000 7600 . + 1 ID=cds00004;Parent=mRNA0003; >> Name=edenprotein.4 >> ## >> >> >> please find attached the gbrowse out put when using >> processed_trancript >> aggregator and processed trancsript glyph. >> Is that behaviour expected? >> >> >> best >> Tobias >> >> ====================================================================== >> Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology >> tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany >> ====================================================================== > -- > ----------------------------------------------------------------------- > - > Scott Cain, Ph. D. > cain@cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > > ====================================================================== Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany ====================================================================== From chen_li3 at yahoo.com Sun Jan 8 11:26:59 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun Jan 8 11:30:40 2006 Subject: [Bioperl-l] how to break a while loop Message-ID: <20060108162659.45580.qmail@web36812.mail.mud.yahoo.com> Hi all, I have a big file containing mutiple records. I just need to read a few of them. I use the following code (copy from Bio::SeqIO HOWTOs with small changes)but it seems to me that it will never end untill the whole file is read. My question: how do I break the loop? Thanks, Li #!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; # get command-line arguments, or die with a usage statement my $usage = "perl SeqIO_8.pl infile infileformat outfile outfileformat\n"; my $infile = shift or die $usage; my $infileformat = shift or die $usage; my $outfile = shift or die $usage; my $outfileformat = shift or die $usage; my $filesize= -s $infile; print "\nThe file size is $filesize.\n"; # create one SeqIO object to read in,and another to write out my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); # write each entry in the input file to the output file my $count=0; while (my $inseq = $seq_in->next_seq){ $seq_out->write_seq($inseq); $count++; print "\nThis is the record number:$count\n"; if ($count=10){exit;}# break a while loop } exit; __________________________________________ Yahoo! DSL ? Something to write home about. Just $16.99/mo. or less. dsl.yahoo.com From skirov at utk.edu Sun Jan 8 12:03:11 2006 From: skirov at utk.edu (Stefan Kirov) Date: Sun Jan 8 11:59:55 2006 Subject: [Bioperl-l] how to break a while loop In-Reply-To: <20060108162659.45580.qmail@web36812.mail.mud.yahoo.com> References: <20060108162659.45580.qmail@web36812.mail.mud.yahoo.com> Message-ID: <43C145CF.70701@utk.edu> last for example: while (my $seq=$io->next_seq) { .... last if ($count==10); #Use ==, not = } read a perl beginners ("Learning Perl" for example) book on perl syntax and operands (at least). It helped me a lot, it will help you. Stefan chen li wrote: >Hi all, > >I have a big file containing mutiple records. I just >need to read a few of them. I use the following code >(copy from Bio::SeqIO HOWTOs with small changes)but it >seems to me that it will never end untill the whole >file is read. My question: how do I break the loop? > >Thanks, > >Li > >#!/usr/bin/perl >use warnings; >use strict; >use Bio::SeqIO; > ># get command-line arguments, or die with a usage >statement > my $usage = "perl SeqIO_8.pl infile infileformat >outfile outfileformat\n"; > my $infile = shift or die $usage; > my $infileformat = shift or die $usage; > my $outfile = shift or die $usage; > my $outfileformat = shift or die $usage; > my $filesize= -s $infile; > print "\nThe file size is $filesize.\n"; > > # create one SeqIO object to read in,and another to >write out > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => >$infileformat); > my $seq_out = Bio::SeqIO->new('-file' => >">$outfile", > '-format' => >$outfileformat); > > # write each entry in the input file to the >output file > my $count=0; > while (my $inseq = $seq_in->next_seq){ > $seq_out->write_seq($inseq); > $count++; > > print "\nThis is the record >number:$count\n"; > if ($count=10){exit;}# break a while >loop > > } > > > > exit; > > > > > > >__________________________________________ >Yahoo! DSL ? Something to write home about. >Just $16.99/mo. or less. >dsl.yahoo.com > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > From marc.saric at gmx.de Mon Jan 9 10:22:43 2006 From: marc.saric at gmx.de (Marc Saric) Date: Mon Jan 9 10:20:58 2006 Subject: [Bioperl-l] BioSQL, bioperl-db and UniGene In-Reply-To: References: Message-ID: <43C27FC3.8030101@gmx.de> First of all, thanks for reading my lenghty mail and thanks for answering. Sean Davis wrote: >>First off, you have seen the TIGR RESOURCERER application >>(http://www.tigr.org/tigr-scripts/magic/r1.pl), right? > > And, since we started out talking about microarrays, are you aware of the > BioConductor project and their annotation efforts, as well as a connection > to Resourcerer? Or http://source.stanford.edu/cgi-bin/source/sourceSearch if you are interested in human, mouse or rat (I am currently not). Yes, I have seen all of that before and some more tools, although I think I have'nt had the time to really try out everything or dig deeply into all tools available. The main point is, that there are some microarray-oligo-sets, which are not covered by an existing source or would need integration and crosschecking of results and because I had some experience with a set (non-zebrafish), which was not covered by any source I could find on the net, I prefer to do my own annotation, preferably based on the probe sequence. I will have a closer look at the stuff you pointed out to me and write again to bioperl-l with some more questions in the near future. Thanks again. -- Bye, Marc Saric From Daniel.Lang at biologie.uni-freiburg.de Mon Jan 9 13:15:55 2006 From: Daniel.Lang at biologie.uni-freiburg.de (Daniel Lang) Date: Mon Jan 9 14:19:33 2006 Subject: [Bioperl-l] Bio::DB::Registry get_all_primary_ids In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA67469EF@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA67469EF@ANTARESIA.be.devgen.com> Message-ID: <1136830555.10729.31.camel@localhost.localdomain> Hi Marc, sorry for my late reply! Thanks, for your answer! Am Mittwoch, den 04.01.2006, 14:01 +0100 schrieb Marc Logghe: > As far as I can see, you are using actually only 1 database (no need to > use Bio::DB::Failover), of type Bio::DB::Flat::BinarySearch. I'm actually retrieving from more than one database (different formats). Not to litter this list I just provided one object dump:) My last remark concerning Failover was, whether it is normal, that I'm getting a Failover object when asking for a Registry database like: $db=$registry->get_database('test'); > Using a Bio::DB::Failover object you can attach multiple databases (e.g. > Bio::DB::SeqI compliant objects). In case it fails to fetch a seq from > the first database in the list, it will try the second and so on. > Bio::DB::Failover ISA Bio::DB::RandomAccessI while > Bio::DB::Flat::BinarySearch ISA Bio::DB::SeqI. In the first case an > implementation of get_all_primary_ids is not necessary, in contrast to > the latter case. > So, you might think of using Bio::DB::Flat::BinarySearch directly if you > depend on that method and you need only 1 database. So is there a method to call the Bio::DB::Flat::BinarySearch attached to the Failover object, which allows me to call all methods suggested by Bio::DB::SeqI on? For I want to get rid of the ugly hashref and arrayref invocations: $db->{_database}->[0]->get_all_primary_ids Cheers, Daniel:) From cain at cshl.edu Fri Jan 6 11:52:38 2006 From: cain at cshl.edu (Scott Cain) Date: Mon Jan 9 19:27:02 2006 Subject: [Bioperl-l] Re: [Gmod-gbrowse] GFF3 aggregation bottom-up In-Reply-To: References: Message-ID: <1136566358.10751.39.camel@localhost.localdomain> Tobias, I missed your second comment with regard to what look like 'extra' features in your display. Please send the track definition section that you are using so I can take a look. Scott On Fri, 2006-01-06 at 15:06 +0100, Tobias Straub wrote: > Lincoln, > > On my way to get the gadfly gff3 files displayed properly I was trying > to rebuild the canonical gene (EDEN) that you describe in your GFF3 > summary on song.sourceforge.net. > Using bioperl 1.5.1 > (http://bioperl.org/DIST/current_core_unstable.tar.gz) and gbrowse I > can neither get just 3 properly processed_transcript features nor the > proper names (i.e. EDEN.1, EDEN.2, EDEN.3) for those when using the > following GFF3 file (as in-memory or mysql database). > > ## > ##gff-version 3 > ##sequence-region ctg123 1 1497228 > ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN > ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001 > ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1 > ctg123 . mRNA 1050 9000 . + . ID=mRNA00002;Parent=gene00001;Name=EDEN.2 > ctg123 . mRNA 1300 9000 . + . ID=mRNA00003;Parent=gene00001;Name=EDEN.3 > ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mRNA00003 > ctg123 . exon 1050 1500 . + . ID=exon00002;Parent=mRNA00001,mRNA00002 > ctg123 . exon 3000 3902 . + . ID=exon00003;Parent=mRNA00001,mRNA00003 > ctg123 . exon 5000 5500 . + . ID=exon00004; > Parent=mRNA00001,mRNA00002,mRNA00003 > ctg123 . exon 7000 9000 . + . ID=exon00005; > Parent=mRNA00001,mRNA00002,mRNA00003 > ctg123 . CDS 1201 1500 . + 0 ID=cds000001;Parent=mRNA0001; > Name=edenprotein.1 > ctg123 . CDS 3000 3902 . + 0 ID=cds000001;Parent=mRNA0001; > Name=edenprotein.1 > ctg123 . CDS 5000 5500 . + 0 ID=cds000001;Parent=mRNA0001; > Name=edenprotein.1 > ctg123 . CDS 7000 7600 . + 0 ID=cds000001;Parent=mRNA0001; > Name=edenprotein.1 > ctg123 . CDS 1201 1500 . + 0 ID=cds000002;Parent=mRNA0002; > Name=edenprotein.2 > ctg123 . CDS 5000 5500 . + 0 ID=cds000002;Parent=mRNA0002; > Name=edenprotein.2 > ctg123 . CDS 7000 7600 . + 0 ID=cds000002;Parent=mRNA0002; > Name=edenprotein.2 > ctg123 . CDS 3301 3902 . + 0 ID=cds00003;Parent=mRNA0003; > Name=edenprotein.3 > ctg123 . CDS 5000 5500 . + 1 ID=cds00003;Parent=mRNA0003; > Name=edenprotein.3 > ctg123 . CDS 7000 7600 . + 1 ID=cds00003;Parent=mRNA0003; > Name=edenprotein.3 > ctg123 . CDS 3391 3902 . + 0 ID=cds00004;Parent=mRNA0003; > Name=edenprotein.4 > ctg123 . CDS 5000 5500 . + 1 ID=cds00004;Parent=mRNA0003; > Name=edenprotein.4 > ctg123 . CDS 7000 7600 . + 1 ID=cds00004;Parent=mRNA0003; > Name=edenprotein.4 > ## > > > please find attached the gbrowse out put when using processed_trancript > aggregator and processed trancsript glyph. > Is that behaviour expected? > > > best > Tobias > > ====================================================================== > Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology > tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany > ====================================================================== -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain at cshl.edu Fri Jan 6 11:48:21 2006 From: cain at cshl.edu (Scott Cain) Date: Mon Jan 9 20:12:42 2006 Subject: [Bioperl-l] Re: [Gmod-gbrowse] GFF3 aggregation bottom-up In-Reply-To: References: Message-ID: <1136566101.10751.37.camel@localhost.localdomain> Hi Tobias, What you are seeing is a result of some of the shoe-horning that had to be done to make GFF3 fit into the current GFF schema. These problems will go away when Lincoln is done working on the new GFF3 schema. Until then, you can get the labels right by using a simple callback in the track configuration, like this: label = sub { my $feature = shift; my ($name) = $feature->attributes('Name'); return $name; } Scott On Fri, 2006-01-06 at 15:06 +0100, Tobias Straub wrote: > Lincoln, > > On my way to get the gadfly gff3 files displayed properly I was trying > to rebuild the canonical gene (EDEN) that you describe in your GFF3 > summary on song.sourceforge.net. > Using bioperl 1.5.1 > (http://bioperl.org/DIST/current_core_unstable.tar.gz) and gbrowse I > can neither get just 3 properly processed_transcript features nor the > proper names (i.e. EDEN.1, EDEN.2, EDEN.3) for those when using the > following GFF3 file (as in-memory or mysql database). > > ## > ##gff-version 3 > ##sequence-region ctg123 1 1497228 > ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN > ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;Parent=gene00001 > ctg123 . mRNA 1050 9000 . + . ID=mRNA00001;Parent=gene00001;Name=EDEN.1 > ctg123 . mRNA 1050 9000 . + . ID=mRNA00002;Parent=gene00001;Name=EDEN.2 > ctg123 . mRNA 1300 9000 . + . ID=mRNA00003;Parent=gene00001;Name=EDEN.3 > ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mRNA00003 > ctg123 . exon 1050 1500 . + . ID=exon00002;Parent=mRNA00001,mRNA00002 > ctg123 . exon 3000 3902 . + . ID=exon00003;Parent=mRNA00001,mRNA00003 > ctg123 . exon 5000 5500 . + . ID=exon00004; > Parent=mRNA00001,mRNA00002,mRNA00003 > ctg123 . exon 7000 9000 . + . ID=exon00005; > Parent=mRNA00001,mRNA00002,mRNA00003 > ctg123 . CDS 1201 1500 . + 0 ID=cds000001;Parent=mRNA0001; > Name=edenprotein.1 > ctg123 . CDS 3000 3902 . + 0 ID=cds000001;Parent=mRNA0001; > Name=edenprotein.1 > ctg123 . CDS 5000 5500 . + 0 ID=cds000001;Parent=mRNA0001; > Name=edenprotein.1 > ctg123 . CDS 7000 7600 . + 0 ID=cds000001;Parent=mRNA0001; > Name=edenprotein.1 > ctg123 . CDS 1201 1500 . + 0 ID=cds000002;Parent=mRNA0002; > Name=edenprotein.2 > ctg123 . CDS 5000 5500 . + 0 ID=cds000002;Parent=mRNA0002; > Name=edenprotein.2 > ctg123 . CDS 7000 7600 . + 0 ID=cds000002;Parent=mRNA0002; > Name=edenprotein.2 > ctg123 . CDS 3301 3902 . + 0 ID=cds00003;Parent=mRNA0003; > Name=edenprotein.3 > ctg123 . CDS 5000 5500 . + 1 ID=cds00003;Parent=mRNA0003; > Name=edenprotein.3 > ctg123 . CDS 7000 7600 . + 1 ID=cds00003;Parent=mRNA0003; > Name=edenprotein.3 > ctg123 . CDS 3391 3902 . + 0 ID=cds00004;Parent=mRNA0003; > Name=edenprotein.4 > ctg123 . CDS 5000 5500 . + 1 ID=cds00004;Parent=mRNA0003; > Name=edenprotein.4 > ctg123 . CDS 7000 7600 . + 1 ID=cds00004;Parent=mRNA0003; > Name=edenprotein.4 > ## > > > please find attached the gbrowse out put when using processed_trancript > aggregator and processed trancsript glyph. > Is that behaviour expected? > > > best > Tobias > > ====================================================================== > Dr. Tobias Straub Adolf-Butenandt-Institute, Molecular Biology > tel: +49-89-2180 75 439 Schillerstr. 44, 80336 Munich, Germany > ====================================================================== -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From ewijaya at singnet.com.sg Mon Jan 9 17:50:56 2006 From: ewijaya at singnet.com.sg (Edward WIJAYA) Date: Mon Jan 9 23:33:56 2006 Subject: [Bioperl-l] Generalized Suffix Tree in BioPerl? Message-ID: Hi, Does anybody know if BioPerl has any implementation on Generalized Suffix tree? By "generalized" I mean: given a set of multiple DNA strings/sequences we would like to construct a single tree out of it. What I can only find so far is Shlomo Yona`s SuffixTree module in CPAN. And it is not generalized, meaning it only create one tree for one string. Hope to hear from you gain. -- Regards, Edward WIJAYA SINGAPORE From peter.robinson at charite.de Tue Jan 10 05:15:25 2006 From: peter.robinson at charite.de (Dr. med. Peter Robinson) Date: Tue Jan 10 05:21:46 2006 Subject: [Bioperl-l] Problem with remote BLAST Message-ID: <55241.192.168.220.201.1136888125.squirrel@webmail.charite.de> Dear BioPerl, I am having difficulties with the BioPerl scripts for remote BLASTing. Using an adaptation of one of the example scripts, an error occurs. I have seen various mails reporting similar problems from 2002-2003, but I was not able to find a solution to the problem. Any help greatly appreciated! The stack trace and the code are below. I am using BioPerl 1.5 on a up-to-date debian linux system. Thanks, Peter [Tue Jan 10 11:02:12 2006] longestORF.pl: [Tue Jan 10 11:02:12 2006] longestORF.pl: ------------- EXCEPTION ------------- [Tue Jan 10 11:02:12 2006] longestORF.pl: MSG: no data for midline Query 246 CTGCTAGTTTGTTGTGATATAGGTAAGAATTTTGC-TTTAAAGTGTGGTATTATTACTTT 304 [Tue Jan 10 11:02:12 2006] longestORF.pl: STACK Bio::SearchIO::blast::next_result /usr/local/share/perl/5.8.4/Bio/SearchIO/blast.pm:1151 [Tue Jan 10 11:02:12 2006] longestORF.pl: STACK main::remoteBLAST longestORF.pl:81 [Tue Jan 10 11:02:12 2006] longestORF.pl: STACK toplevel longestORF.pl:28 [Tue Jan 10 11:02:12 2006] longestORF.pl: [Tue Jan 10 11:02:12 2006] longestORF.pl: -------------------------------------- The function is: ## ## 2) Call this function to print out results of remote ## BLAST. For instance, can be set into a
 element
##    in an HTML page.

sub remoteBLAST
  {
    my $input = $_[0];  # a BioPerl Seq object
    my $fh    = $_[1];  # a file handle
    my $v = 1;          #  'verbose';
    my $prog = 'blastn';
    my $db   = 'nr';
    my $e_val= '1e-2';

    my @params = ( -prog => $prog,
		   -data => $db,
		   -expect => $e_val,
		   -readmethod => 'SearchIO',
		   -report_type => 'blastn',
		   -m => '3');

    my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
    my $r = $factory->submit_blast($input);
    print $fh $r->table();

    print STDERR "waiting..." if( $v > 0 );

    while ( my @rids = $factory->each_rid ) {
      foreach my $rid ( @rids ) {
	my $rc = $factory->retrieve_blast($rid);
	if( !ref($rc) ) {
	  if( $rc < 0 ) {
	    $factory->remove_rid($rid);
	  }
	  print STDERR "." if ( $v > 0 );
	  sleep 10;
	} else {
	  my $result = $rc->next_result();

	  $factory->remove_rid($rid);
	  print $fh "\nQuery: ", $result->query_name(), "\t",
	  $result->query_description(),"\n";
	  while ( my $hit = $result->next_hit ) {

	    print $fh "\thit:", $hit->name, "\t",
	      $hit->accession(), "\t", $hit->description(),"\t";

	    while( my $hsp = $hit->next_hsp ) {
	      print $fh "\te-val is ", $hsp->evalue, "\n";
	      last;  ## Just print most significant e value
	    }
	  }
	}
      }
    }
  }



From jason.stajich at duke.edu  Tue Jan 10 14:02:52 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jan 10 13:59:19 2006
Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] Can BPLite parse HTML formatted
	BLAST output
References: <48256290-D78A-47F6-8E0A-5F3F411E09DC@duke.edu>
Message-ID: 

forward to main bioperl list too so others can see

Begin forwarded message:

> From: Jason Stajich 
> Date: January 10, 2006 2:02:14 PM EST
> To: Richard Francis 
> Cc: bioperl-guts-l@bioperl.org
> Subject: Re: [Bioperl-guts-l] Can BPLite parse HTML formatted BLAST  
> output
>
>
> http://bioperl.org/Core/Latest/faq.html#Q3.8
>
> But do not rely on this in the future to work always, NCBI has made  
> explicit they cannot promised to keep the HTML output that comes  
> from the website BLAST server parseable -- apparently according to  
> them, XML is the only blessed format which is guaranteed to always  
> be consistently parseable.
>
> -jason
> On Jan 10, 2006, at 1:21 PM, Richard Francis wrote:
>
>> Dear all,
>>
>> I currently use BPLite to parse my text based BLAST outputs.
>> I was wondering if BPLite or another BioPerl tool can parse an HTML
>> formatted BLAST output to pull out the same types of information.
>>
>> Kind regards and many thanks for any help in advance,
>>
>> Richard Francis
>>
>>
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12


From christoph.gille at charite.de  Tue Jan 10 16:03:37 2006
From: christoph.gille at charite.de (Dr. Christoph Gille)
Date: Tue Jan 10 16:07:29 2006
Subject: [Bioperl-l] internet proxy 
Message-ID: <47893.192.168.220.203.1136927017.squirrel@webmail.charite.de>

Please apologize my stupid question:
How do I tell BioPerl that I have a http proxy ?

I try  Bio/Tools/Analysis/Protein/Sopma.pm
which computes second struct predictions using a HTTP server.
Since I do not have direct Internet I've  set the variables http_proxy and
HTTP_PROXY to the proxy server.

Linux$ echo  $http_proxy $HTTP_PROXY
http://realproxy.charite.de:888 http://realproxy.charite.de:888

But Sopma is not able to connect to the Internet.
At home with direct Internet it worked fine.
All other Internet programs can cope with the proxy without problem.
For example wget http://www.google.de fetches the data.

Sopma will be my first test case of how to bring Java and BioPerl together.

Many thanks Christoph


From torsten.seemann at infotech.monash.edu.au  Tue Jan 10 21:20:02 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue Jan 10 21:17:19 2006
Subject: [Bioperl-l] internet proxy
In-Reply-To: <47893.192.168.220.203.1136927017.squirrel@webmail.charite.de>
References: <47893.192.168.220.203.1136927017.squirrel@webmail.charite.de>
Message-ID: <43C46B52.7060600@infotech.monash.edu.au>

Dr. Christoph Gille wrote:
> Please apologize my stupid question:
> How do I tell BioPerl that I have a http proxy ?
> I try  Bio/Tools/Analysis/Protein/Sopma.pm
> which computes second struct predictions using a HTTP server.
> Since I do not have direct Internet I've  set the variables http_proxy and
> HTTP_PROXY to the proxy server.
> But Sopma is not able to connect to the Internet.
> At home with direct Internet it worked fine.
> All other Internet programs can cope with the proxy without problem.
> For example wget http://www.google.de fetches the data.

Sopma.pm ISA Bio::WebAgent ISA LWP::UserAgent.
LWP::UserAgent does the actual HTTP work.
Try:

  my $sopma = Bio::Tools::Analysis::Protein::Sopma->new( ... );
  $sopma->env_proxy;  # tell LWP::UserAgent to use proxy env. vars.
  $sopma->run;

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
From iain.m.wallace at gmail.com  Wed Jan 11 07:01:24 2006
From: iain.m.wallace at gmail.com (Iain Wallace)
Date: Wed Jan 11 07:05:09 2006
Subject: [Bioperl-l] Problem with Graphics
In-Reply-To: <4BF51350-BB17-4A50-9ABD-357EE920439F@duke.edu>
References: <8cff3eb80512140759w1f434423t4cbd4939e5fe798c@mail.gmail.com>
	<4BF51350-BB17-4A50-9ABD-357EE920439F@duke.edu>
Message-ID: <8cff3eb80601110401o508a89e0n5991d5fb6124f483@mail.gmail.com>

Thanks Jason,
That works for me even though I am on linux (not sure why, but it does)

Iain

On 12/14/05, Jason Stajich  wrote:
> Are you on windows?
>
> Try adding this before calling print.
> binmode (STDOUT);
>
> On Dec 14, 2005, at 10:59 AM, Iain Wallace wrote:
>
> > Hi,
> >
> > I am trying to use the Bio::Graphics module, but am unable to view my
> > output file. When I try to view the file I am told the file is
> > corrupt.
> >
> > Below is the code that I tried and it seems to work (i.e. it doesn't
> > crash and generates an output file)
> >
> > Unfortunately I have no idea what the error could be.
> > Any help/pointers would be greatly appreciated
> >
> > Thanks
> >
> > Iain
> > ---- Code from the How To ---
> >
> > #!/usr/bin/perl
> >
> >     use strict;
> >
> >     use Bio::Graphics;
> >     use Bio::SeqFeature::Generic;
> >
> >   my $panel = Bio::Graphics::Panel->new(-length => 1000,-width  =>
> > 800);
> >     my $track = $panel->add_track(-glyph => 'generic',-label  => 1);
> >
> >     while (<>) { # read blast file
> >       chomp;
> >       next if /^\#/;  # ignore comments
> >      my($name,$score,$start,$end) = split /\t+/;
> >      my $feature =
> > Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,
> >                                                  -start=>$start,-
> > end=>$end);
> >      $track->add_feature($feature);
> >    }
> >
> >    print $panel->png;
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>

From akarger at CGR.Harvard.edu  Wed Jan 11 16:13:41 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed Jan 11 16:20:55 2006
Subject: [Bioperl-l] blast output -> blast -m8 output
Message-ID: <339D68B133EAD311971E009027DC479703DB404F@montecarlo.cgr.harvard.edu>

> From: Jason Stajich [mailto:jason.stajich@duke.edu] 
> 
> The existing search2table script in scripts/searchio does this for  
> you - I don't think there is a writer plugin but there could be.

Ah nice. but:
-------------------
>perl bioperl-1.5.0-RC1/scripts/searchio/search2table.PLS seqs.blp > zzz
>more zzz
Bacteriophage_1[M19348] ref|NP_037061.1|        40.32   62      27      4
28      89      1050    1107    6e-05   46.6
Bacteriophage_1[M19348] ref|XP_193814.5|        48.89   45      16      6
57      95      320     364     0.001   42.7
Bacteriophage_1[M19348] ref|XP_912463.1|        48.89   45      16      6
57      95      866     910     0.001   42.7
Bacteriophage_1[M19348] ref|XP_619329.2|        48.89   45      16      6
57      95      676     720     0.001   42.7
C.elegans_1_[Z49071]    ref|XP_917828.1|        29.61   412     183     48
40      410     52      456     6e-43   173
C.elegans_1_[Z49071]    gb|AAI10184.1|  31.99   347     147     23      40
373     53      389     6e-42   169
>more seqs.m8
Bacteriophage_1[M19348] gi|6978677|ref|NP_037061.1|     40.32   62      33
1       28      89      1050    1107    6e-05   46.6
Bacteriophage_1[M19348] gi|82958039|ref|XP_193814.5|    48.89   45      17
1       57      95      320     364     0.001   42.7
Bacteriophage_1[M19348] gi|82958037|ref|XP_912463.1|    48.89   45      17
1       57      95      866     910     0.001   42.7
Bacteriophage_1[M19348] gi|82957449|ref|XP_619329.2|    48.89   45      17
1       57      95      676     720     0.001   42.7
C.elegans_1_[Z49071]    gi|82802536|ref|XP_917828.1|    29.61   412     242
9       40      410     52      456     6e-43    173
C.elegans_1_[Z49071]    gi|82571607|gb|AAI10184.1|      31.99   347     213
11      40      373     53      389     6e-42    169
-----------------

I know we can't get around the problem of the IDs, since blast & blast -m8
give different IDs. But columns 5 and 6 (mismatches, gap openings) are
consistently different. Is search2table not trying to mimic -m8 exactly, or
is this a bug?

Apologies if this is due to using bioperl 1.4 and the PLS script from
1.5.0-RC1. That's what I have on hand.

> 
> Note that if you just using BLAST you will find that the blast2table  
> script that is included in the BLAST book (see the O'Reilly website  
> for the book and download the code examples) will also generate this  
> sort of thing for you and will be many times faster than SearchIO  
> code. 

I could steal that. But I was thinking that if NCBI changes the BLAST
format, bioperl may upgrade while the dead trees code won't.

- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626

> There is also an equivalent hmmer_to_table and  
> fastam9_to_table which are very fast re-formatters that don't  
> actually use SearchIO since one is just trying to get the 
> very simple  
> data out.
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12/
> 
> 
From cjfields at uiuc.edu  Wed Jan 11 16:26:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed Jan 11 16:34:17 2006
Subject: [Bioperl-l] error running load_seqdatabase.pl
In-Reply-To: <000001c612f7$5f1a77b0$15327e82@pyrimidine>
Message-ID: <000001c616f5$a92e24d0$15327e82@pyrimidine>

Hilmar, 

As an update on what's going on:

I've run into a few problems with load_seqdatabase.pl and bioperl-db on
cygwin which I'll try to hash through this week; I'll post if I can't figure
it out soon.  It's not as buggy as trying to run it using the latest
ActivePerl on WinXP, but it still has issues.  

I'm also looking through the ActiveState documentation for the latest
version of perl they have (5.8.7), which I am running.  AFAIK, they enable
dynamic loading when building.  I'll send them an email directly to see what
they say.  There may be some Win32-specific way of configuring a script for
dynamic loading of perl modules which isn't needed in other environments. 

There was also this previous email on bioperl-l:

http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html

Baohua Wang seemed to narrow it down somewhat, but I'm not sure if changing
the modules is a solution until I figure out why he made the changes.  They
seem mainly geared towards getting load_seqdatabase to work with MsSQL, but
if he got it to work on Windows, then he may be onto something.  The
modified Bio* modules can be found at:

ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows 

I'll check them out to see if they work out and see what specific
modifications he made (they're not detailed).

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 
-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields
Sent: Friday, January 06, 2006 1:28 PM
To: 'Hilmar Lapp'
Cc: bioperl-l@portal.open-bio.org
Subject: RE: [Bioperl-l] error running load_seqdatabase.pl

I'll try installing bioperl-db using Cygwin.  I know that I can connect to
the native Windows mysql database from inside cygwin, so perhaps this will
do as a short term workaround.  I'll also try using a different native win32
Perl version (maybe 5.6) and look into the dynamic loading issue.  I know
that the AS Perl has given errors like this before and not had problems (I
think it was also cranky with older versions bioperl), but this one is
pretty serious.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 
-----Original Message-----
From: Hilmar Lapp [mailto:hlapp@gmx.net] 
Sent: Friday, January 06, 2006 12:02 PM
To: Chris Fields
Cc: bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] error running load_seqdatabase.pl


On Jan 6, 2006, at 9:20 AM, Chris Fields wrote:

> Hilmar,
>
> Did this ever get resolved?  I tried to reinstall a biosql database 
> using
> bioperl-db and got the same problems.  I'll list out everything I ran 
> into
> and what I pan on trying, as it's been a long time since I've tried 
> this.
>
> Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL 
> 4.1.14.
> Using nmake and installing worked fine.  Loading the biosql schema and
> loading taxonomy info also worked fine, although I had to manually 
> untar the
> taxonomy archive so load_ncbi_taxonomy.pl could find the files (stupid
> windows).  However, this is what happens when using 
> load_seqdatabase.pl:
>
> C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser root
> NP_249092.gpt
> Loading NP_249092.gpt ...
> Undefined subroutine &Bio::Root::Root::debug called at
> C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, 
> 
> line 65.
>
> If I removed all args except the sequence file, it gives the same 
> response,
> which means it happens before the connection is made to the database:
>

This happens indeed before a connection is made because it happens at 
the point it tries to dynamically load the BioSQL driver for the 
adaptor:

	$self->debug("attempting to load driver for adaptor class
$class\n");

The BioSQL driver is loaded before the DBD driver is loaded.

The module in which this happens (i.e., the persistence adaptor) has 
been loaded dynamically as well.

Bio::Root::Root is in the 'use' statements, and the debug() method 
clearly exists. I'm at a loss as to why perl complains on certain 
Windows platforms. If somebody can tell me what, if anything, can be 
done to make this work on those platforms too I'll be glad to implement 
it.

> [...]
> Here's the error messages from that first test (warning it's very 
> messy):
>
> C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, 
> 'bl
> ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t
> t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t t\08genbank.t
> t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t t\13remove.t
> t\14query.t t\15cluster.t
> t\01dbadaptor.....ok 1/19Subroutine new redefined at
> [...]
> Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line 
> 356.

So obviously it is there, right? So why doesn't perl see it a minute 
later?

> [...]
> I'll end with that.  At this moment, I can't see it working with the 
> current
> setup.  I was using perl 5.8 with the old setup but I upgraded mysql 
> at some
> point when working with gbrowse (I can't remember what the old version 
> was);
> I'll try upgrading to the newest ActiveState version to see what 
> happens.
> Could it be the MySQL version?

I don't think it has anything to do with the MySQL version, or the DBD 
driver for that matter. Instead, it looks like on issue with dynamic 
loading of perl modules on your particular platform.

	-hilmar

>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From cain at cshl.edu  Thu Jan 12 11:34:34 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu Jan 12 11:31:28 2006
Subject: [Bioperl-l] Patch for GFF.pm for BioPerl 1.5.1
Message-ID: <1137083674.3033.35.camel@localhost.localdomain>

Hello,

I created a patch for GFF.pm that is in BioPerl 1.5.1.  This fixes a bug
that caused GFF file loading to fail when using bp_load_gff.pl.  The
patch is now part of the GBrowse 1.64 release:

  http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513

and there are release notes as part of the release the describe how to
apply the patch:

  http://sourceforge.net/project/shownotes.php?release_id=374912&group_id=27707

Thanks to Don Gilbert and Jason Stajich for pointing out what needed to
be patched and Tobias Straub for insisting that there really was
something wrong even though it took me a long time to see it.

Sorry for the hassle,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From golharam at umdnj.edu  Thu Jan 12 13:38:50 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu Jan 12 14:34:28 2006
Subject: [Bioperl-l] Current Version on Web Site?
Message-ID: <002001c617a7$6f89ffd0$2f01a8c0@GOLHARMOBILE1>

What is the official current version of BioPerl?  1.5.1?

The website still has a lot of stuff pointing to 1.4 as the current
version...

Ryan

From bmoore at genetics.utah.edu  Thu Jan 12 14:45:29 2006
From: bmoore at genetics.utah.edu (Barry Moore)
Date: Thu Jan 12 14:40:26 2006
Subject: [Bioperl-l] Current Version on Web Site?
Message-ID: 

1.4 is the current "stable" release.  1.6 and all even numbered releases
will be the stable releases, and 1.5 and all odd numbered releases are
developer releases.

Barry

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-
> bounces@portal.open-bio.org] On Behalf Of Ryan Golhar
> Sent: Thursday, January 12, 2006 11:39 AM
> To: 'bioperl-l'
> Subject: [Bioperl-l] Current Version on Web Site?
> 
> What is the official current version of BioPerl?  1.5.1?
> 
> The website still has a lot of stuff pointing to 1.4 as the current
> version...
> 
> Ryan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From cain at cshl.edu  Thu Jan 12 15:06:41 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu Jan 12 19:27:58 2006
Subject: [Bioperl-l] Current Version on Web Site?
In-Reply-To: 
References: 
Message-ID: <1137096401.3033.52.camel@localhost.localdomain>

While 1.4 is the current stable release, it is quite old.  Release 1.5.1
is fairly 'stable' for an unstable release, and is required for some
popular packages, like GBrowse.

Scott


On Thu, 2006-01-12 at 12:45 -0700, Barry Moore wrote:
> 1.4 is the current "stable" release.  1.6 and all even numbered releases
> will be the stable releases, and 1.5 and all odd numbered releases are
> developer releases.
> 
> Barry
> 
> > -----Original Message-----
> > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-
> > bounces@portal.open-bio.org] On Behalf Of Ryan Golhar
> > Sent: Thursday, January 12, 2006 11:39 AM
> > To: 'bioperl-l'
> > Subject: [Bioperl-l] Current Version on Web Site?
> > 
> > What is the official current version of BioPerl?  1.5.1?
> > 
> > The website still has a lot of stuff pointing to 1.4 as the current
> > version...
> > 
> > Ryan
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From hubert.prielinger at gmx.at  Thu Jan 12 18:48:30 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu Jan 12 19:42:22 2006
Subject: [Bioperl-l] remoteblast
Message-ID: <43C6EACE.8010708@gmx.at>

Hi,
I have encountered an error, while my remoteblast file was running...
I have used that file since two weeks, and all of a sudden, I got the 
following message error:

-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
Content-Length: 210
Content-Type: application/x-www-form-urlencoded

GAPCOSTS=9+1&DATABASE=nr&QUERY=%3E+%0AKWRRWKRR&COMPOSITION_BASED_STATISTICS=off&EXPECT=20000&WORD_SIZE=2&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=PAM30&FILTER=L&DESCRIPTIONS=1000&PROGRAM=blastp


An Error Occurred

An Error Occurred

302 Found --------------------------------------------------- I hope somebody can help....thanks in advance regards From hubert.prielinger at gmx.at Thu Jan 12 18:57:16 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu Jan 12 19:51:08 2006 Subject: [Bioperl-l] parse Blast Output and Composition Based Statistics parameter Message-ID: <43C6ECDC.7050308@gmx.at> Hello, I want to know, if there is a possibility to get from a Blast Outputfile the whole Sequence of a protein not only the best local alignment... for example: >ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica cultivar-group)] dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica cultivar-group)] Length=95 Score = 24.1 bits (47), Expect = 493 Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) Query 2 KKRRRWW 8 K+RRRWW Sbjct 87 KRRRRWW 93 and now, if I parse the file, I want to get the whole Sequence of this hypothetical protein....is that possible with hsp for example, or any other way.... my second question is: I do my blast search with bioperl and the remoteblast module.....each parameter is working very well, except the composition based statistics parameter.... it looks like that: my $factory = $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = 'yes'; it should work like that, but it doesn't.... Thanks for your help in advance...... regards Hubert From jbikandi at gmail.com Wed Jan 11 13:07:03 2006 From: jbikandi at gmail.com (Joseba Bikandi) Date: Thu Jan 12 20:55:30 2006 Subject: [Bioperl-l] Biophp.org released Message-ID: <36b6cfbd0601111007k61aa33d3s15c20452730a8683@mail.gmail.com> BioPHP.org has been released. It is a open source project, and the basic idea is to create an online repository of functions and minitools. Both types of code are editable by using a wiki-like service, so new code con be easily developed. Minitools are one page copy and paste complete scripts intented to be used for basic computation. Due to similarity between Perl and PHP, we encourage Bioperl comunity to visit our site and to participate in this project. From jason.stajich at duke.edu Thu Jan 12 20:50:33 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu Jan 12 21:05:27 2006 Subject: [Bioperl-l] parse Blast Output and Composition Based Statistics parameter In-Reply-To: <43C6ECDC.7050308@gmx.at> References: <43C6ECDC.7050308@gmx.at> Message-ID: <2CF48095-DF0E-4BB5-AAB8-3B8DBC813E76@duke.edu> (please don't try and post to bioperl-announce, it is not for questions.) On Jan 12, 2006, at 6:57 PM, Hubert Prielinger wrote: > Hello, > I want to know, if there is a possibility to get from a Blast > Outputfile the whole Sequence of a protein not only the best local > alignment... > for example: > No. The parser can only return to you what is in the report file... use Bio::DB::GenPept to retrieve the sequence via the web or (recommended) use a locally indexed sequence database like Bio::DB::Fasta > >ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica > cultivar-group)] > dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica > cultivar-group)] > Length=95 > > Score = 24.1 bits (47), Expect = 493 > Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) > > Query 2 KKRRRWW 8 > K+RRRWW > Sbjct 87 KRRRRWW 93 > > and now, if I parse the file, I want to get the whole Sequence of > this hypothetical protein....is that possible with hsp for example, > or any other way.... > > my second question is: > I do my blast search with bioperl and the remoteblast > module.....each parameter is working very well, except the > composition based statistics parameter.... > it looks like that: > > my $factory = $Bio::Tools::Run::RemoteBlast::HEADER > {'COMPOSITION_BASED_STATISTICS'} = 'yes'; > uh no that is not how you would do it. You can make it the default for any factories you use in the script by doing this > $Bio::Tools::Run::RemoteBlast::HEADER > {'COMPOSITION_BASED_STATISTICS'} = 'yes'; then $factory = Bio::Tools::Run::RemoteBlast->new(); =OR= Once you have a factory object you can set the parameter explicitly: $factory->submit_parameter('COMPOSITION_BASED_STATISTICS', 'yes'); > it should work like that, but it doesn't.... > > Thanks for your help in advance...... > > regards > Hubert > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Thu Jan 12 22:27:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Jan 12 22:25:41 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: Message-ID: <000c01c617f1$3aeee610$15327e82@pyrimidine> Looks like the below modification Baohua Wang made to Root.pm works. I did run into another weird issue, but I think it is a sequence formatting problem. I try loading in a file with protein sequences in GenPept format (pulled from BLASTP output using Bio::DB::GenPept and saved in a file using SeqIO) after changing Root.pm: ______________________________________________________________________ C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass ****** -format genbank -safe NP_252217.gpt Loading NP_252217.gpt ... C:\Perl\Scripts> ______________________________________________________________________ Good! The strangeness comes in when using Genpept seqs NOT passed through SeqIO (pulled directly from NCBI, saved in a similar file). Most sequences will load, but a number of them will not: ______________________________________________________________________ C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass crackers ol -format genbank -safe NP_249092.gpt Loading NP_249092.gpt ... -------------------- WARNING --------------------- MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were ("","HAMAPMF_00220","0") FKs () Column 'dbname' cannot be null --------------------------------------------------- Could not store Q59712: ------------- EXCEPTION ------------- MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:208 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 STACK Bio::DB::Persistent::PersistentObject::store C:/Perl/site/lib/Bio/DB/Persistent/PersistentObject.pm:272 STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children C:/Perl/site/lib/Bio\DB\BioSQL\AnnotationCollectionAdaptor.pm:219 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 STACK Bio::DB::Persistent::PersistentObject::store C:/Perl/site/lib/Bio/DB/Persistent/PersistentObject.pm:272 STACK Bio::DB::BioSQL::SeqAdaptor::store_children C:/Perl/site/lib/Bio\DB\BioSQL\SeqAdaptor.pm:226 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:216 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 STACK Bio::DB::Persistent::PersistentObject::store C:/Perl/site/lib/Bio/DB/Persistent/PersistentObject.pm:272 STACK (eval) C:\Perl\Scripts\load_seqdatabase.pl:620 STACK toplevel C:\Perl\Scripts\load_seqdatabase.pl:603 -------------------------------------- at C:\Perl\Scripts\load_seqdatabase.pl line 633 .... at C:\Perl\Scripts\load_seqdatabase.pl line 633 Could not store AAU82296: ------------- EXCEPTION ------------- MSG: create: object (Bio::Species) failed to insert or to be found by unique key STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:208 STACK Bio::DB::Persistent::PersistentObject::create C:/Perl/site/lib/Bio/DB/Persistent/PersistentObject.pm:245 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:171 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254 STACK Bio::DB::Persistent::PersistentObject::store C:/Perl/site/lib/Bio/DB/Persistent/PersistentObject.pm:272 STACK (eval) C:\Perl\Scripts\load_seqdatabase.pl:620 STACK toplevel C:\Perl\Scripts\load_seqdatabase.pl:603 -------------------------------------- at C:\Perl\Scripts\load_seqdatabase.pl line 633 ______________________________________________________________________ I'll check them out to try and derive what the differences are. I will also pass the above file through SeqIO to see what happens. I think it could be some of the GenPept formatted stuff is clogging up the works since I saved everything in Genbank format through SeqIO. For now, though, bioperl-db on Windows works! Any idea why the 'throw' change works? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign -----Original Message----- From: drycafe@gmail.com [mailto:drycafe@gmail.com] On Behalf Of Hilmar Lapp Sent: Wednesday, January 11, 2006 5:13 PM To: Chris Fields; Steve Chervitz Cc: bioperl-l@portal.open-bio.org Subject: Re: [Bioperl-l] error running load_seqdatabase.pl Interesting. That posting didn't receive much attention did it. So he states: The script failed on throw() in loading Bio/Root/Root.pm on Windows. The problem lines are those "throw $class (...". After I put comma after $class as "throw $class, (...", the BioSQL tests and load scripts are succeeded Can anyone of those who wrote the Root exception and warning code comment? Maybe Steve? -hilmar On 1/11/06, Chris Fields wrote: > Hilmar, > > As an update on what's going on: > > I've run into a few problems with load_seqdatabase.pl and bioperl-db on > cygwin which I'll try to hash through this week; I'll post if I can't figure > it out soon. It's not as buggy as trying to run it using the latest > ActivePerl on WinXP, but it still has issues. > > I'm also looking through the ActiveState documentation for the latest > version of perl they have (5.8.7), which I am running. AFAIK, they enable > dynamic loading when building. I'll send them an email directly to see what > they say. There may be some Win32-specific way of configuring a script for > dynamic loading of perl modules which isn't needed in other environments. > > There was also this previous email on bioperl-l: > > http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html > > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if changing > the modules is a solution until I figure out why he made the changes. They > seem mainly geared towards getting load_seqdatabase to work with MsSQL, but > if he got it to work on Windows, then he may be onto something. The > modified Bio* modules can be found at: > > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows > > I'll check them out to see if they work out and see what specific > modifications he made (they're not detailed). > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, January 06, 2006 1:28 PM > To: 'Hilmar Lapp' > Cc: bioperl-l@portal.open-bio.org > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl > > I'll try installing bioperl-db using Cygwin. I know that I can connect to > the native Windows mysql database from inside cygwin, so perhaps this will > do as a short term workaround. I'll also try using a different native win32 > Perl version (maybe 5.6) and look into the dynamic loading issue. I know > that the AS Perl has given errors like this before and not had problems (I > think it was also cranky with older versions bioperl), but this one is > pretty serious. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Friday, January 06, 2006 12:02 PM > To: Chris Fields > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > > > Hilmar, > > > > Did this ever get resolved? I tried to reinstall a biosql database > > using > > bioperl-db and got the same problems. I'll list out everything I ran > > into > > and what I pan on trying, as it's been a long time since I've tried > > this. > > > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL > > 4.1.14. > > Using nmake and installing worked fine. Loading the biosql schema and > > loading taxonomy info also worked fine, although I had to manually > > untar the > > taxonomy archive so load_ncbi_taxonomy.pl could find the files (stupid > > windows). However, this is what happens when using > > load_seqdatabase.pl: > > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser root > > NP_249092.gpt > > Loading NP_249092.gpt ... > > Undefined subroutine &Bio::Root::Root::debug called at > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, > > > > line 65. > > > > If I removed all args except the sequence file, it gives the same > > response, > > which means it happens before the connection is made to the database: > > > > This happens indeed before a connection is made because it happens at > the point it tries to dynamically load the BioSQL driver for the > adaptor: > > $self->debug("attempting to load driver for adaptor class > $class\n"); > > The BioSQL driver is loaded before the DBD driver is loaded. > > The module in which this happens (i.e., the persistence adaptor) has > been loaded dynamically as well. > > Bio::Root::Root is in the 'use' statements, and the debug() method > clearly exists. I'm at a loss as to why perl complains on certain > Windows platforms. If somebody can tell me what, if anything, can be > done to make this work on those platforms too I'll be glad to implement > it. > > > [...] > > Here's the error messages from that first test (warning it's very > > messy): > > > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, > > 'bl > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t t\08genbank.t > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t t\13remove.t > > t\14query.t t\15cluster.t > > t\01dbadaptor.....ok 1/19Subroutine new redefined at > > [...] > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line > > 356. > > So obviously it is there, right? So why doesn't perl see it a minute > later? > > > [...] > > I'll end with that. At this moment, I can't see it working with the > > current > > setup. I was using perl 5.8 with the old setup but I upgraded mysql > > at some > > point when working with gbrowse (I can't remember what the old version > > was); > > I'll try upgrading to the newest ActiveState version to see what > > happens. > > Could it be the MySQL version? > > I don't think it has anything to do with the MySQL version, or the DBD > driver for that matter. Instead, it looks like on issue with dynamic > loading of perl modules on your particular platform. > > -hilmar > > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From hlapp at gmx.net Thu Jan 12 23:28:14 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jan 13 00:39:40 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: <000c01c617f1$3aeee610$15327e82@pyrimidine> References: <000c01c617f1$3aeee610$15327e82@pyrimidine> Message-ID: On 1/12/06, Chris Fields wrote: > Looks like the below modification Baohua Wang made to Root.pm works. I did > run into another weird issue, but I think it is a sequence formatting > problem. I try loading in a file with protein sequences in GenPept format > (pulled from BLASTP output using Bio::DB::GenPept and saved in a file using > SeqIO) after changing Root.pm: > ______________________________________________________________________ > > C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass > ****** -format genbank -safe NP_252217.gpt > Loading NP_252217.gpt ... > > C:\Perl\Scripts> > ______________________________________________________________________ > > Good! Great! So we'll have to test that the effect of adding that comma isn't negative on Unix platforms but I suspect it's in fact required by syntax and maybe on Windows perl is less lenient? Odd at any rate. > > The strangeness comes in when using Genpept seqs NOT passed through SeqIO > (pulled directly from NCBI, saved in a similar file). Most sequences will > load, but a number of them will not: > > ______________________________________________________________________ > C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass > ****** -format genbank -safe NP_249092.gpt > Loading NP_249092.gpt ... > > -------------------- WARNING --------------------- > MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were > ("","HAMAPMF_00220","0") FKs () > Column 'dbname' cannot be null > --------------------------------------------------- > Could not store Q59712: Are you sure you pulled this from NCBI using NP_249092 as the accession? I'm asking because NP_249092 is a perfectly sane looking RefSeq record and in fact does not contain the string HAMAPMF, whereas Q59712 in reality is a Uniprot record moulded into GenPept format; some of the db_xrefs come out odd and in fact for the one above (HAMAPMF_00220) there is no dbname, most likely because dbname and accession are concatenated like for the following InterPro db_xref. So I don't think this is worrisome unless you insist you used the NP_249092 entry ... I would generally advise against taking Uniprot/Swissprot entries from their GenPept reincarnation. The formats are incompatible in some aspects (e.g., Swissprot, like EMBL, has first-level db_xrefs, whereas GenBank format doesn't; instead it puts db_xrefs into the feature table). > [...] > at C:\Perl\Scripts\load_seqdatabase.pl line 633 > Could not store AAU82296: > ------------- EXCEPTION ------------- > MSG: create: object (Bio::Species) failed to insert or to be found by unique > key "uncultured archaeon GZfos13E1" is not something Bioperl will parse correctly into the appropriate Bio::Species structure (not that I would even know what that would have to look like ;). However, if you preload your Biosql instance with the NCBI taxonomy database then this is not a problem because the species will be looked up correctly by its NCBI taxon ID (which the genbank SeqIO parser extracts from the feature table if it's there - and it is in this case). > [...] > I'll check them out to try and derive what the differences are. I will also > pass the above file through SeqIO to see what happens. Note that everything you pull down through Bio::DB::GenPept does get parsed by Bio::SeqIO::genbank - if there is any difference it must be because the input files aren't identical. > I think it could be some of the GenPept formatted stuff is clogging up the works since I saved > everything in Genbank format through SeqIO. Ah - meaning you got the file by calling $seqio->write_seq($seq) ? That could cause it's own problems (even though theoretically it shouldn't and therefore if it does it counts as a bug). > For now, though, bioperl-db on > Windows works! Any idea why the 'throw' change works? No, no idea - but great that you found out. -hilmar > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > -----Original Message----- > From: drycafe@gmail.com [mailto:drycafe@gmail.com] On Behalf Of Hilmar Lapp > Sent: Wednesday, January 11, 2006 5:13 PM > To: Chris Fields; Steve Chervitz > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > Interesting. That posting didn't receive much attention did it. So he > states: > > > The script failed on throw() in loading Bio/Root/Root.pm on Windows. > The problem lines are those "throw $class (...". After I put comma > after $class as "throw $class, (...", the BioSQL tests and load scripts > are succeeded > > > Can anyone of those who wrote the Root exception and warning code > comment? Maybe Steve? > > -hilmar > > On 1/11/06, Chris Fields wrote: > > Hilmar, > > > > As an update on what's going on: > > > > I've run into a few problems with load_seqdatabase.pl and bioperl-db on > > cygwin which I'll try to hash through this week; I'll post if I can't > figure > > it out soon. It's not as buggy as trying to run it using the latest > > ActivePerl on WinXP, but it still has issues. > > > > I'm also looking through the ActiveState documentation for the latest > > version of perl they have (5.8.7), which I am running. AFAIK, they enable > > dynamic loading when building. I'll send them an email directly to see > what > > they say. There may be some Win32-specific way of configuring a script > for > > dynamic loading of perl modules which isn't needed in other environments. > > > > There was also this previous email on bioperl-l: > > > > http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html > > > > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if > changing > > the modules is a solution until I figure out why he made the changes. > They > > seem mainly geared towards getting load_seqdatabase to work with MsSQL, > but > > if he got it to work on Windows, then he may be onto something. The > > modified Bio* modules can be found at: > > > > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows > > > > I'll check them out to see if they work out and see what specific > > modifications he made (they're not detailed). > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields > > Sent: Friday, January 06, 2006 1:28 PM > > To: 'Hilmar Lapp' > > Cc: bioperl-l@portal.open-bio.org > > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl > > > > I'll try installing bioperl-db using Cygwin. I know that I can connect to > > the native Windows mysql database from inside cygwin, so perhaps this will > > do as a short term workaround. I'll also try using a different native > win32 > > Perl version (maybe 5.6) and look into the dynamic loading issue. I know > > that the AS Perl has given errors like this before and not had problems (I > > think it was also cranky with older versions bioperl), but this one is > > pretty serious. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp@gmx.net] > > Sent: Friday, January 06, 2006 12:02 PM > > To: Chris Fields > > Cc: bioperl-l@portal.open-bio.org > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > > > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > > > > > Hilmar, > > > > > > Did this ever get resolved? I tried to reinstall a biosql database > > > using > > > bioperl-db and got the same problems. I'll list out everything I ran > > > into > > > and what I pan on trying, as it's been a long time since I've tried > > > this. > > > > > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL > > > 4.1.14. > > > Using nmake and installing worked fine. Loading the biosql schema and > > > loading taxonomy info also worked fine, although I had to manually > > > untar the > > > taxonomy archive so load_ncbi_taxonomy.pl could find the files (stupid > > > windows). However, this is what happens when using > > > load_seqdatabase.pl: > > > > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser root > > > NP_249092.gpt > > > Loading NP_249092.gpt ... > > > Undefined subroutine &Bio::Root::Root::debug called at > > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, > > > > > > line 65. > > > > > > If I removed all args except the sequence file, it gives the same > > > response, > > > which means it happens before the connection is made to the database: > > > > > > > This happens indeed before a connection is made because it happens at > > the point it tries to dynamically load the BioSQL driver for the > > adaptor: > > > > $self->debug("attempting to load driver for adaptor class > > $class\n"); > > > > The BioSQL driver is loaded before the DBD driver is loaded. > > > > The module in which this happens (i.e., the persistence adaptor) has > > been loaded dynamically as well. > > > > Bio::Root::Root is in the 'use' statements, and the debug() method > > clearly exists. I'm at a loss as to why perl complains on certain > > Windows platforms. If somebody can tell me what, if anything, can be > > done to make this work on those platforms too I'll be glad to implement > > it. > > > > > [...] > > > Here's the error messages from that first test (warning it's very > > > messy): > > > > > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, > > > 'bl > > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t > > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t t\08genbank.t > > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t t\13remove.t > > > t\14query.t t\15cluster.t > > > t\01dbadaptor.....ok 1/19Subroutine new redefined at > > > [...] > > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line > > > 356. > > > > So obviously it is there, right? So why doesn't perl see it a minute > > later? > > > > > [...] > > > I'll end with that. At this moment, I can't see it working with the > > > current > > > setup. I was using perl 5.8 with the old setup but I upgraded mysql > > > at some > > > point when working with gbrowse (I can't remember what the old version > > > was); > > > I'll try upgrading to the newest ActiveState version to see what > > > happens. > > > Could it be the MySQL version? > > > > I don't think it has anything to do with the MySQL version, or the DBD > > driver for that matter. Instead, it looks like on issue with dynamic > > loading of perl modules on your particular platform. > > > > -hilmar > > > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp at gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From hlapp at gmx.net Wed Jan 11 18:12:45 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jan 13 02:52:28 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: <000001c616f5$a92e24d0$15327e82@pyrimidine> References: <000001c612f7$5f1a77b0$15327e82@pyrimidine> <000001c616f5$a92e24d0$15327e82@pyrimidine> Message-ID: Interesting. That posting didn't receive much attention did it. So he states: The script failed on throw() in loading Bio/Root/Root.pm on Windows. The problem lines are those "throw $class (...". After I put comma after $class as "throw $class, (...", the BioSQL tests and load scripts are succeeded Can anyone of those who wrote the Root exception and warning code comment? Maybe Steve? -hilmar On 1/11/06, Chris Fields wrote: > Hilmar, > > As an update on what's going on: > > I've run into a few problems with load_seqdatabase.pl and bioperl-db on > cygwin which I'll try to hash through this week; I'll post if I can't figure > it out soon. It's not as buggy as trying to run it using the latest > ActivePerl on WinXP, but it still has issues. > > I'm also looking through the ActiveState documentation for the latest > version of perl they have (5.8.7), which I am running. AFAIK, they enable > dynamic loading when building. I'll send them an email directly to see what > they say. There may be some Win32-specific way of configuring a script for > dynamic loading of perl modules which isn't needed in other environments. > > There was also this previous email on bioperl-l: > > http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html > > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if changing > the modules is a solution until I figure out why he made the changes. They > seem mainly geared towards getting load_seqdatabase to work with MsSQL, but > if he got it to work on Windows, then he may be onto something. The > modified Bio* modules can be found at: > > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows > > I'll check them out to see if they work out and see what specific > modifications he made (they're not detailed). > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields > Sent: Friday, January 06, 2006 1:28 PM > To: 'Hilmar Lapp' > Cc: bioperl-l@portal.open-bio.org > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl > > I'll try installing bioperl-db using Cygwin. I know that I can connect to > the native Windows mysql database from inside cygwin, so perhaps this will > do as a short term workaround. I'll also try using a different native win32 > Perl version (maybe 5.6) and look into the dynamic loading issue. I know > that the AS Perl has given errors like this before and not had problems (I > think it was also cranky with older versions bioperl), but this one is > pretty serious. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp@gmx.net] > Sent: Friday, January 06, 2006 12:02 PM > To: Chris Fields > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > > > Hilmar, > > > > Did this ever get resolved? I tried to reinstall a biosql database > > using > > bioperl-db and got the same problems. I'll list out everything I ran > > into > > and what I pan on trying, as it's been a long time since I've tried > > this. > > > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL > > 4.1.14. > > Using nmake and installing worked fine. Loading the biosql schema and > > loading taxonomy info also worked fine, although I had to manually > > untar the > > taxonomy archive so load_ncbi_taxonomy.pl could find the files (stupid > > windows). However, this is what happens when using > > load_seqdatabase.pl: > > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser root > > NP_249092.gpt > > Loading NP_249092.gpt ... > > Undefined subroutine &Bio::Root::Root::debug called at > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, > > > > line 65. > > > > If I removed all args except the sequence file, it gives the same > > response, > > which means it happens before the connection is made to the database: > > > > This happens indeed before a connection is made because it happens at > the point it tries to dynamically load the BioSQL driver for the > adaptor: > > $self->debug("attempting to load driver for adaptor class > $class\n"); > > The BioSQL driver is loaded before the DBD driver is loaded. > > The module in which this happens (i.e., the persistence adaptor) has > been loaded dynamically as well. > > Bio::Root::Root is in the 'use' statements, and the debug() method > clearly exists. I'm at a loss as to why perl complains on certain > Windows platforms. If somebody can tell me what, if anything, can be > done to make this work on those platforms too I'll be glad to implement > it. > > > [...] > > Here's the error messages from that first test (warning it's very > > messy): > > > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, > > 'bl > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t t\08genbank.t > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t t\13remove.t > > t\14query.t t\15cluster.t > > t\01dbadaptor.....ok 1/19Subroutine new redefined at > > [...] > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line > > 356. > > So obviously it is there, right? So why doesn't perl see it a minute > later? > > > [...] > > I'll end with that. At this moment, I can't see it working with the > > current > > setup. I was using perl 5.8 with the old setup but I upgraded mysql > > at some > > point when working with gbrowse (I can't remember what the old version > > was); > > I'll try upgrading to the newest ActiveState version to see what > > happens. > > Could it be the MySQL version? > > I don't think it has anything to do with the MySQL version, or the DBD > driver for that matter. Instead, it looks like on issue with dynamic > loading of perl modules on your particular platform. > > -hilmar > > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From Steve_Chervitz at affymetrix.com Fri Jan 13 05:25:34 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Fri Jan 13 05:34:15 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: Message-ID: looks like the trouble is when Bio::Root::Root::throw() tries to call Error::throw(). Perhaps there is some windows-specific problem with Error.pm? Can't say I've seen this before since I don't use perl on windows. Some things to try, in this order: * Verify that Error.pm is installed for perl on your system. * Try running t/Exception.t and the examples/root/exceptions[1-4].pl scripts and see if they produce the expected behavior. * Try changing the 'throw $class ...' statements in Root.pm to 'Error::throw $class ...' * If Error.pm seems to be installed but isn't working right, either uninstall it or get in the habit of putting this line in your main scripts: INIT { $DONT_USE_ERROR=1; } Steve On Wed, 11 Jan 2006, Hilmar Lapp wrote: > Date: Wed, 11 Jan 2006 15:12:45 -0800 > From: Hilmar Lapp > To: Chris Fields , > Steve Chervitz > Cc: bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > Interesting. That posting didn't receive much attention did it. So he states: > > > The script failed on throw() in loading Bio/Root/Root.pm on Windows. > The problem lines are those "throw $class (...". After I put comma > after $class as "throw $class, (...", the BioSQL tests and load scripts > are succeeded > > > Can anyone of those who wrote the Root exception and warning code > comment? Maybe Steve? > > -hilmar > > On 1/11/06, Chris Fields wrote: > > Hilmar, > > > > As an update on what's going on: > > > > I've run into a few problems with load_seqdatabase.pl and bioperl-db on > > cygwin which I'll try to hash through this week; I'll post if I can't figure > > it out soon. It's not as buggy as trying to run it using the latest > > ActivePerl on WinXP, but it still has issues. > > > > I'm also looking through the ActiveState documentation for the latest > > version of perl they have (5.8.7), which I am running. AFAIK, they enable > > dynamic loading when building. I'll send them an email directly to see what > > they say. There may be some Win32-specific way of configuring a script for > > dynamic loading of perl modules which isn't needed in other environments. > > > > There was also this previous email on bioperl-l: > > > > http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html > > > > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if changing > > the modules is a solution until I figure out why he made the changes. They > > seem mainly geared towards getting load_seqdatabase to work with MsSQL, but > > if he got it to work on Windows, then he may be onto something. The > > modified Bio* modules can be found at: > > > > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows > > > > I'll check them out to see if they work out and see what specific > > modifications he made (they're not detailed). > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris Fields > > Sent: Friday, January 06, 2006 1:28 PM > > To: 'Hilmar Lapp' > > Cc: bioperl-l@portal.open-bio.org > > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl > > > > I'll try installing bioperl-db using Cygwin. I know that I can connect to > > the native Windows mysql database from inside cygwin, so perhaps this will > > do as a short term workaround. I'll also try using a different native win32 > > Perl version (maybe 5.6) and look into the dynamic loading issue. I know > > that the AS Perl has given errors like this before and not had problems (I > > think it was also cranky with older versions bioperl), but this one is > > pretty serious. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp@gmx.net] > > Sent: Friday, January 06, 2006 12:02 PM > > To: Chris Fields > > Cc: bioperl-l@portal.open-bio.org > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > > > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > > > > > Hilmar, > > > > > > Did this ever get resolved? I tried to reinstall a biosql database > > > using > > > bioperl-db and got the same problems. I'll list out everything I ran > > > into > > > and what I pan on trying, as it's been a long time since I've tried > > > this. > > > > > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL > > > 4.1.14. > > > Using nmake and installing worked fine. Loading the biosql schema and > > > loading taxonomy info also worked fine, although I had to manually > > > untar the > > > taxonomy archive so load_ncbi_taxonomy.pl could find the files (stupid > > > windows). However, this is what happens when using > > > load_seqdatabase.pl: > > > > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser root > > > NP_249092.gpt > > > Loading NP_249092.gpt ... > > > Undefined subroutine &Bio::Root::Root::debug called at > > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, > > > > > > line 65. > > > > > > If I removed all args except the sequence file, it gives the same > > > response, > > > which means it happens before the connection is made to the database: > > > > > > > This happens indeed before a connection is made because it happens at > > the point it tries to dynamically load the BioSQL driver for the > > adaptor: > > > > $self->debug("attempting to load driver for adaptor class > > $class\n"); > > > > The BioSQL driver is loaded before the DBD driver is loaded. > > > > The module in which this happens (i.e., the persistence adaptor) has > > been loaded dynamically as well. > > > > Bio::Root::Root is in the 'use' statements, and the debug() method > > clearly exists. I'm at a loss as to why perl complains on certain > > Windows platforms. If somebody can tell me what, if anything, can be > > done to make this work on those platforms too I'll be glad to implement > > it. > > > > > [...] > > > Here's the error messages from that first test (warning it's very > > > messy): > > > > > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, > > > 'bl > > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t > > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t t\08genbank.t > > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t t\13remove.t > > > t\14query.t t\15cluster.t > > > t\01dbadaptor.....ok 1/19Subroutine new redefined at > > > [...] > > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line > > > 356. > > > > So obviously it is there, right? So why doesn't perl see it a minute > > later? > > > > > [...] > > > I'll end with that. At this moment, I can't see it working with the > > > current > > > setup. I was using perl 5.8 with the old setup but I upgraded mysql > > > at some > > > point when working with gbrowse (I can't remember what the old version > > > was); > > > I'll try upgrading to the newest ActiveState version to see what > > > happens. > > > Could it be the MySQL version? > > > > I don't think it has anything to do with the MySQL version, or the DBD > > driver for that matter. Instead, it looks like on issue with dynamic > > loading of perl modules on your particular platform. > > > > -hilmar > > > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp at gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason at portal.open-bio.org Fri Jan 13 08:02:38 2006 From: jason at portal.open-bio.org (Jason Stajich) Date: Fri Jan 13 08:05:48 2006 Subject: [Bioperl-l] Re: problem in /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm In-Reply-To: <1137156850.7510.106.camel@sb289.gbf-braunschweig.de> References: <1137149266.7510.102.camel@sb289.gbf-braunschweig.de> <1137156850.7510.106.camel@sb289.gbf-braunschweig.de> Message-ID: <746D3A29-366C-4F83-A5BB-61EF8BC15D5C@bioperl.org> NCBI reserves the right to make the HTML or Text unparseable from the CGI, I guess they've now done that. See these posts: http://bioperl.org/pipermail/bioperl-l/2005-September/019760.html http://bioperl.org/pipermail/bioperl-l/2005-September/019724.html -jason On Jan 13, 2006, at 7:54 AM, Guido Dieterich wrote: > Hi, Jason > > > RemoteBlast.pm > I printed out the NCBI report that was requested! > > Guido > > > Am Freitag, den 13.01.2006, 07:44 -0500 schrieb Jason Stajich: > >> This is from RemoteBlast or from blast run on the command line? >> >> On Jan 13, 2006, at 5:47 AM, Guido Dieterich wrote: >> >>> Hi Jason, hi all, >>> >>> it seems so that ncbi changed again its blast output format: >>> as example: >>> they added a/more Feature line(s) >>>>>>> >>> Features in this part of subject sequence: >>> oxidoreductase, pyridine nucleotide-disulfide family >>>>>>> >>> >>> >>> Bio/SearchIO/blast.pm >>> >>> will cause a problem. Error message is: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: no data for midline Features flanking this part of subject >>> sequence: >>> STACK: Error::throw >>> STACK: >>> Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.6/Bio/Root/ >>> Root.pm:328 >>> STACK: >>> Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.6/ >>> Bio/SearchIO/blast.pm:1166 >>> STACK: main::remoteBLAST /home/gdi/Perl-skripte/longestORF.pl:95 >>> STACK: /home/gdi/Perl-skripte/longestORF.pl:34 >>> ----------------------------------------------------------- >>> >>> >>> >>> longer example of the blast ncbi report: >>> >>>> gb|AE016830.1| Enterococcus faecalis V583, complete genome >>> Length=3218031 >>> >>> Features in this part of subject sequence: >>> oxidoreductase, pyridine nucleotide-disulfide family >>> >>> Score = 79.8 bits (40), Expect = 3e-11 >>> Identities = 82/96 (85%), Gaps = 0/96 (0%) >>> Strand=Plus/Plus >>> >>> Query 1129 >>> ATGAAACATATGGTTAACTTGTACTACTTCTTCGGTATCCGTAGTGGTTACTACATGTGG 1188 >>> ||||||||||| ||||||||| | |||||||| | ||| ||| >>> |||||||||||||| >>> Sbjct 3136253 >>> ATGAAACATATCGTTAACTTGAAATACTTCTTTGATATTCGTTCTGGTTACTACATGTTC >>> 3136312 >>> >>> Query 1189 CAATATATTATGCATGAATTCTTCCACATTAAAGAT 1224 >>> ||||| |||||||| ||| ||||||| ||||||||| >>> Sbjct 3136313 CAATACATTATGCACGAAATCTTCCATATTAAAGAT 3136348 >>> >>> >>> Features in this part of subject sequence:oxidoreductase, pyridine >>> nucleotide-disulfide family >>> >>> Score = 77.8 bits (39), Expect = 1e-10 >>> Identities = 90/107 (84%), Gaps = 0/107 (0%) >>> Strand=Plus/Plus >>> >>> Query 469 >>> AAAGCGATGTTAACATTCGTTGTTTGTGGATCTGGATTTACTGGTATCGAAATGGTTGGG 528 >>> ||||| ||||||||||||||||| ||||| ||||| |||||||| >>> ||||||||||| >>> || >>> Sbjct 3135587 >>> AAAGCAATGTTAACATTCGTTGTCTGTGGTTCTGGTTTTACTGGGATCGAAATGGTCGGC >>> 3135646 >>> >>> Query 529 GAACTTTTAGAATGGAAAGATCGTCTTGCTAAAGATAACAAAATTGA 575 >>> ||| | | || |||||||||||| | || |||||| ||||||||| >>> Sbjct 3135647 GAATTAATCGACTGGAAAGATCGTTTAGCGAAAGATGCCAAAATTGA >>> 3135693 >>> >>> >> >> -- >> Jason Stajich >> jason@bioperl.org >> http://jason.open-bio.org/ >> -- Jason Stajich jason@bioperl.org http://jason.open-bio.org/ From jason at portal.open-bio.org Fri Jan 13 07:44:43 2006 From: jason at portal.open-bio.org (Jason Stajich) Date: Fri Jan 13 09:21:06 2006 Subject: [Bioperl-l] Re: problem in /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm In-Reply-To: <1137149266.7510.102.camel@sb289.gbf-braunschweig.de> References: <1137149266.7510.102.camel@sb289.gbf-braunschweig.de> Message-ID: This is from RemoteBlast or from blast run on the command line? On Jan 13, 2006, at 5:47 AM, Guido Dieterich wrote: > Hi Jason, hi all, > > it seems so that ncbi changed again its blast output format: > as example: > they added a/more Feature line(s) >>>>> > Features in this part of subject sequence: > oxidoreductase, pyridine nucleotide-disulfide family >>>>> > > > Bio/SearchIO/blast.pm > > will cause a problem. Error message is: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: no data for midline Features flanking this part of subject > sequence: > STACK: Error::throw > STACK: > Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.6/Bio/Root/ > Root.pm:328 > STACK: > Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.6/ > Bio/SearchIO/blast.pm:1166 > STACK: main::remoteBLAST /home/gdi/Perl-skripte/longestORF.pl:95 > STACK: /home/gdi/Perl-skripte/longestORF.pl:34 > ----------------------------------------------------------- > > > > longer example of the blast ncbi report: > >> gb|AE016830.1| Enterococcus faecalis V583, complete genome > Length=3218031 > > Features in this part of subject sequence: > oxidoreductase, pyridine nucleotide-disulfide family > > Score = 79.8 bits (40), Expect = 3e-11 > Identities = 82/96 (85%), Gaps = 0/96 (0%) > Strand=Plus/Plus > > Query 1129 > ATGAAACATATGGTTAACTTGTACTACTTCTTCGGTATCCGTAGTGGTTACTACATGTGG 1188 > ||||||||||| ||||||||| | |||||||| | ||| ||| > |||||||||||||| > Sbjct 3136253 > ATGAAACATATCGTTAACTTGAAATACTTCTTTGATATTCGTTCTGGTTACTACATGTTC 3136312 > > Query 1189 CAATATATTATGCATGAATTCTTCCACATTAAAGAT 1224 > ||||| |||||||| ||| ||||||| ||||||||| > Sbjct 3136313 CAATACATTATGCACGAAATCTTCCATATTAAAGAT 3136348 > > > Features in this part of subject sequence:oxidoreductase, pyridine > nucleotide-disulfide family > > Score = 77.8 bits (39), Expect = 1e-10 > Identities = 90/107 (84%), Gaps = 0/107 (0%) > Strand=Plus/Plus > > Query 469 > AAAGCGATGTTAACATTCGTTGTTTGTGGATCTGGATTTACTGGTATCGAAATGGTTGGG 528 > ||||| ||||||||||||||||| ||||| ||||| |||||||| > ||||||||||| > || > Sbjct 3135587 > AAAGCAATGTTAACATTCGTTGTCTGTGGTTCTGGTTTTACTGGGATCGAAATGGTCGGC 3135646 > > Query 529 GAACTTTTAGAATGGAAAGATCGTCTTGCTAAAGATAACAAAATTGA 575 > ||| | | || |||||||||||| | || |||||| ||||||||| > Sbjct 3135647 GAATTAATCGACTGGAAAGATCGTTTAGCGAAAGATGCCAAAATTGA > 3135693 > > -- Jason Stajich jason@bioperl.org http://jason.open-bio.org/ From hlapp at gmx.net Fri Jan 13 11:41:23 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jan 13 11:38:08 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: <000301c6185b$5b61bb60$15327e82@pyrimidine> References: <000301c6185b$5b61bb60$15327e82@pyrimidine> Message-ID: <79bce52e7530892a6a6819897d18a7a0@gmx.net> On Jan 13, 2006, at 8:06 AM, Chris Fields wrote: > [...] > I think we really should probably give credit to Baohua Wang for > noting the > change in throw. Yes, absolutely, smart guy. You get credit for persistence and digging it up again 9 months later :-) > If it pans out, this may be what is responsible for error > messages popping up every once in a while with bioperl scripts. There > is > one thing of note: Steve mentions that Error.pm should be present: 'Could', not 'should'. The toolkit needs to work in the absence of Error.pm too (and does on most platforms; e.g., I don't have it installed). It may turn out though that the missing comma is silent on most (all?) non-Windows platforms or if Error.pm is installed, and therefore wasn't noticed by most people. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From cjfields at uiuc.edu Fri Jan 13 11:06:43 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Jan 13 11:50:06 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: Message-ID: <000301c6185b$5b61bb60$15327e82@pyrimidine> Sorry, I should have clarified; NP_249092.gpt is a file carrying all protein sequences with significant BLASTP score hits to NP_249092 in GenPept format(which also includes the sequence NP_249092). Only a number of these had problems, all of which seem to be Uniprot. I had problems using my script to download the sequences b/c of NCBI's limit for batch sequence extraction, so I used the Batch Entrez interface to download them (i.e. they are directly from the protein database at NCBI). NP_252217.gpt is the same as above (a file with sig. hits to NP_252217) but had fewer hits, so batch extraction through Bio::DB::GenPept worked (they were then passed as Bio::SeqIO objects and saved in GenBank format). As reported before, there were no errors with that file. The other issue, with taxonomy, was fixed when I loaded the database using load_ncbi_taxonomy.pl. I dropped the old database, reinstalled the schema, but forgot to add in the taxonomic info. I think we really should probably give credit to Baohua Wang for noting the change in throw. If it pans out, this may be what is responsible for error messages popping up every once in a while with bioperl scripts. There is one thing of note: Steve mentions that Error.pm should be present: > -----Original Message----- > From: Steve Chervitz [mailto:Steve_Chervitz@affymetrix.com] > Sent: Friday, January 13, 2006 4:26 AM > To: Hilmar Lapp > Cc: Chris Fields; Steve Chervitz; bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > looks like the trouble is when Bio::Root::Root::throw() tries to call > Error::throw(). Perhaps there is some windows-specific problem with > Error.pm? Can't say I've seen this before since I don't use perl on > windows. > > Some things to try, in this order: > > * Verify that Error.pm is installed for perl on your system. > * Try running t/Exception.t and > the examples/root/exceptions[1-4].pl scripts and see if they > produce the expected behavior. > * Try changing the 'throw $class ...' statements in Root.pm to > 'Error::throw $class ...' > * If Error.pm seems to be installed but isn't working right, either > uninstall it or get in the habit of putting this line in your main > scripts: INIT { $DONT_USE_ERROR=1; } > > Steve The requirement didn't pop up when creating the PPM distro. It also isn't included in ActivePerl but is available. I've installed it and will go through the above to see if it changes anything using unmodified Root.pm. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: drycafe@gmail.com [mailto:drycafe@gmail.com] On Behalf Of Hilmar > Lapp > Sent: Thursday, January 12, 2006 10:28 PM > To: Chris Fields > Cc: Steve Chervitz; bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > On 1/12/06, Chris Fields wrote: > > Looks like the below modification Baohua Wang made to Root.pm works. I > did > > run into another weird issue, but I think it is a sequence formatting > > problem. I try loading in a file with protein sequences in GenPept > format > > (pulled from BLASTP output using Bio::DB::GenPept and saved in a file > using > > SeqIO) after changing Root.pm: > > ______________________________________________________________________ > > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass > > ****** -format genbank -safe NP_252217.gpt > > Loading NP_252217.gpt ... > > > > C:\Perl\Scripts> > > ______________________________________________________________________ > > > > Good! > > Great! So we'll have to test that the effect of adding that comma > isn't negative on Unix platforms but I suspect it's in fact required > by syntax and maybe on Windows perl is less lenient? Odd at any rate. > > > > > The strangeness comes in when using Genpept seqs NOT passed through > SeqIO > > (pulled directly from NCBI, saved in a similar file). Most sequences > will > > load, but a number of them will not: > > > > ______________________________________________________________________ > > C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass > > ****** -format genbank -safe NP_249092.gpt > > Loading NP_249092.gpt ... > > > > -------------------- WARNING --------------------- > > MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values > were > > ("","HAMAPMF_00220","0") FKs () > > Column 'dbname' cannot be null > > --------------------------------------------------- > > Could not store Q59712: > > Are you sure you pulled this from NCBI using NP_249092 as the > accession? I'm asking because NP_249092 is a perfectly sane looking > RefSeq record and in fact does not contain the string HAMAPMF, whereas > Q59712 in reality is a Uniprot record moulded into GenPept format; > some of the db_xrefs come out odd and in fact for the one above > (HAMAPMF_00220) there is no dbname, most likely because dbname and > accession are concatenated like for the following InterPro db_xref. > > So I don't think this is worrisome unless you insist you used the > NP_249092 entry ... > > I would generally advise against taking Uniprot/Swissprot entries from > their GenPept reincarnation. The formats are incompatible in some > aspects (e.g., Swissprot, like EMBL, has first-level db_xrefs, whereas > GenBank format doesn't; instead it puts db_xrefs into the feature > table). > > > [...] > > at C:\Perl\Scripts\load_seqdatabase.pl line 633 > > Could not store AAU82296: > > ------------- EXCEPTION ------------- > > MSG: create: object (Bio::Species) failed to insert or to be found by > unique > > key > > "uncultured archaeon GZfos13E1" is not something Bioperl will parse > correctly into the appropriate Bio::Species structure (not that I > would even know what that would have to look like ;). > > However, if you preload your Biosql instance with the NCBI taxonomy > database then this is not a problem because the species will be looked > up correctly by its NCBI taxon ID (which the genbank SeqIO parser > extracts from the feature table if it's there - and it is in this > case). > > > [...] > > I'll check them out to try and derive what the differences are. I will > also > > pass the above file through SeqIO to see what happens. > > Note that everything you pull down through Bio::DB::GenPept does get > parsed by Bio::SeqIO::genbank - if there is any difference it must be > because the input files aren't identical. > > > I think it could be some of the GenPept formatted stuff is clogging up > the works since I saved > > everything in Genbank format through SeqIO. > > Ah - meaning you got the file by calling $seqio->write_seq($seq) ? > That could cause it's own problems (even though theoretically it > shouldn't and therefore if it does it counts as a bug). > > > For now, though, bioperl-db on > > Windows works! Any idea why the 'throw' change works? > > No, no idea - but great that you found out. > > -hilmar > > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > -----Original Message----- > > From: drycafe@gmail.com [mailto:drycafe@gmail.com] On Behalf Of Hilmar > Lapp > > Sent: Wednesday, January 11, 2006 5:13 PM > > To: Chris Fields; Steve Chervitz > > Cc: bioperl-l@portal.open-bio.org > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > Interesting. That posting didn't receive much attention did it. So he > > states: > > > > > > The script failed on throw() in loading Bio/Root/Root.pm on Windows. > > The problem lines are those "throw $class (...". After I put comma > > after $class as "throw $class, (...", the BioSQL tests and load scripts > > are succeeded > > > > > > Can anyone of those who wrote the Root exception and warning code > > comment? Maybe Steve? > > > > -hilmar > > > > On 1/11/06, Chris Fields wrote: > > > Hilmar, > > > > > > As an update on what's going on: > > > > > > I've run into a few problems with load_seqdatabase.pl and bioperl-db > on > > > cygwin which I'll try to hash through this week; I'll post if I can't > > figure > > > it out soon. It's not as buggy as trying to run it using the latest > > > ActivePerl on WinXP, but it still has issues. > > > > > > I'm also looking through the ActiveState documentation for the latest > > > version of perl they have (5.8.7), which I am running. AFAIK, they > enable > > > dynamic loading when building. I'll send them an email directly to > see > > what > > > they say. There may be some Win32-specific way of configuring a > script > > for > > > dynamic loading of perl modules which isn't needed in other > environments. > > > > > > There was also this previous email on bioperl-l: > > > > > > http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html > > > > > > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if > > changing > > > the modules is a solution until I figure out why he made the changes. > > They > > > seem mainly geared towards getting load_seqdatabase to work with > MsSQL, > > but > > > if he got it to work on Windows, then he may be onto something. The > > > modified Bio* modules can be found at: > > > > > > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows > > > > > > I'll check them out to see if they work out and see what specific > > > modifications he made (they're not detailed). > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org > > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris > Fields > > > Sent: Friday, January 06, 2006 1:28 PM > > > To: 'Hilmar Lapp' > > > Cc: bioperl-l@portal.open-bio.org > > > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl > > > > > > I'll try installing bioperl-db using Cygwin. I know that I can > connect to > > > the native Windows mysql database from inside cygwin, so perhaps this > will > > > do as a short term workaround. I'll also try using a different native > > win32 > > > Perl version (maybe 5.6) and look into the dynamic loading issue. I > know > > > that the AS Perl has given errors like this before and not had > problems (I > > > think it was also cranky with older versions bioperl), but this one is > > > pretty serious. > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > -----Original Message----- > > > From: Hilmar Lapp [mailto:hlapp@gmx.net] > > > Sent: Friday, January 06, 2006 12:02 PM > > > To: Chris Fields > > > Cc: bioperl-l@portal.open-bio.org > > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > > > > > > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > > > > > > > Hilmar, > > > > > > > > Did this ever get resolved? I tried to reinstall a biosql database > > > > using > > > > bioperl-db and got the same problems. I'll list out everything I > ran > > > > into > > > > and what I pan on trying, as it's been a long time since I've tried > > > > this. > > > > > > > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL > > > > 4.1.14. > > > > Using nmake and installing worked fine. Loading the biosql schema > and > > > > loading taxonomy info also worked fine, although I had to manually > > > > untar the > > > > taxonomy archive so load_ncbi_taxonomy.pl could find the files > (stupid > > > > windows). However, this is what happens when using > > > > load_seqdatabase.pl: > > > > > > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser > root > > > > NP_249092.gpt > > > > Loading NP_249092.gpt ... > > > > Undefined subroutine &Bio::Root::Root::debug called at > > > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, > > > > > > > > line 65. > > > > > > > > If I removed all args except the sequence file, it gives the same > > > > response, > > > > which means it happens before the connection is made to the > database: > > > > > > > > > > This happens indeed before a connection is made because it happens at > > > the point it tries to dynamically load the BioSQL driver for the > > > adaptor: > > > > > > $self->debug("attempting to load driver for adaptor class > > > $class\n"); > > > > > > The BioSQL driver is loaded before the DBD driver is loaded. > > > > > > The module in which this happens (i.e., the persistence adaptor) has > > > been loaded dynamically as well. > > > > > > Bio::Root::Root is in the 'use' statements, and the debug() method > > > clearly exists. I'm at a loss as to why perl complains on certain > > > Windows platforms. If somebody can tell me what, if anything, can be > > > done to make this work on those platforms too I'll be glad to > implement > > > it. > > > > > > > [...] > > > > Here's the error messages from that first test (warning it's very > > > > messy): > > > > > > > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, > > > > 'bl > > > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t > > > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t > t\08genbank.t > > > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t > t\13remove.t > > > > t\14query.t t\15cluster.t > > > > t\01dbadaptor.....ok 1/19Subroutine new redefined at > > > > [...] > > > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line > > > > 356. > > > > > > So obviously it is there, right? So why doesn't perl see it a minute > > > later? > > > > > > > [...] > > > > I'll end with that. At this moment, I can't see it working with the > > > > current > > > > setup. I was using perl 5.8 with the old setup but I upgraded mysql > > > > at some > > > > point when working with gbrowse (I can't remember what the old > version > > > > was); > > > > I'll try upgrading to the newest ActiveState version to see what > > > > happens. > > > > Could it be the MySQL version? > > > > > > I don't think it has anything to do with the MySQL version, or the DBD > > > driver for that matter. Instead, it looks like on issue with dynamic > > > loading of perl modules on your particular platform. > > > > > > -hilmar > > > > > > > > > > > Christopher Fields > > > > Postdoctoral Researcher - Switzer Lab > > > > Dept. of Biochemistry > > > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > -- > > > ------------------------------------------------------------- > > > Hilmar Lapp email: lapp at gnf.org > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > ------------------------------------------------------------- > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > -- > > ---------------------------------------------------------- > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > ---------------------------------------------------------- > > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- From cjfields at uiuc.edu Fri Jan 13 12:53:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Jan 13 12:50:22 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: Message-ID: <000501c6186a$5294eed0$15327e82@pyrimidine> So here's what I found: * Running t\Exception.t with or without Error.pm installed didn't reveal any errors. C:\Documents and Settings\Administrator\My Documents\CVS\bioperl-live>perl -I -w t\Exception.t 1..7 ok 1 ok 2 Setting test data (Eeny meeny miney moe.) ok 3 Executing method bar() in TestObject Throwing a Bio::TestException ok 4 ok 5 ok 6 ok 7 * Running the example scripts (exceptions[1-4].pl) with or w/o Error.pm showed no difference (I checked with diff). * Changing "throw $class" to "Error::throw $class" in Root.pm didn't do anything, which is strange (I did this with and w/o Error.pm installed). I thought at this point, that Activestate may have Error.pm as part of their core modules, but it isn't included anywhere in the Perl directory tree or under PERL5LIB. It also isn't listed as CORE in their modules list (http://ppm.activestate.com/BuildStatus/5.8-E.html); the core modules are usually under '/lib' instead of '/site/lib'. So why would "Error::throw" even work? I also tried 'perl -e "require Error" and didn't get errors, so it has to be around somewhere. * Even stranger, when changing "throw $class" to "Error::throw $class" in Root.pm, load_seqdatabase.pl works fine, just like when "throw $class" is changed to "throw $class,". Oi!! * Changing load_seqdatabase.pl to include the line "INIT { $DONT_USE_ERROR=1; }" also didn't do anything; only changes to Root.pm made a difference. Lesson: Windows is flaky. I think that much of this behavior is just ActivePerl-specific, which may be why it hasn't been seen elsewhere. I don't know much about ActivePerl and exception handling, so I may delve into it a bit more to see if there is something else there. I also dropped Activestate an email asking about Error.pm and their core distribution. So, the question is, should Root.pm be changed in bioperl-live? Obviously this would need to be well tested out before committing any changes. I could try it out on Mac OS X. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Steve Chervitz > Sent: Friday, January 13, 2006 4:26 AM > To: Hilmar Lapp > Cc: Chris Fields; bioperl-l@portal.open-bio.org; Steve Chervitz > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > looks like the trouble is when Bio::Root::Root::throw() tries to call > Error::throw(). Perhaps there is some windows-specific problem with > Error.pm? Can't say I've seen this before since I don't use perl on > windows. > > Some things to try, in this order: > > * Verify that Error.pm is installed for perl on your system. > * Try running t/Exception.t and > the examples/root/exceptions[1-4].pl scripts and see if they > produce the expected behavior. > * Try changing the 'throw $class ...' statements in Root.pm to > 'Error::throw $class ...' > * If Error.pm seems to be installed but isn't working right, either > uninstall it or get in the habit of putting this line in your main > scripts: INIT { $DONT_USE_ERROR=1; } > > Steve > > On Wed, 11 Jan 2006, Hilmar Lapp wrote: > > > Date: Wed, 11 Jan 2006 15:12:45 -0800 > > From: Hilmar Lapp > > To: Chris Fields , > > Steve Chervitz > > Cc: bioperl-l@portal.open-bio.org > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > Interesting. That posting didn't receive much attention did it. So he > states: > > > > > > The script failed on throw() in loading Bio/Root/Root.pm on Windows. > > The problem lines are those "throw $class (...". After I put comma > > after $class as "throw $class, (...", the BioSQL tests and load scripts > > are succeeded > > > > > > Can anyone of those who wrote the Root exception and warning code > > comment? Maybe Steve? > > > > -hilmar > > > > On 1/11/06, Chris Fields wrote: > > > Hilmar, > > > > > > As an update on what's going on: > > > > > > I've run into a few problems with load_seqdatabase.pl and bioperl-db > on > > > cygwin which I'll try to hash through this week; I'll post if I can't > figure > > > it out soon. It's not as buggy as trying to run it using the latest > > > ActivePerl on WinXP, but it still has issues. > > > > > > I'm also looking through the ActiveState documentation for the latest > > > version of perl they have (5.8.7), which I am running. AFAIK, they > enable > > > dynamic loading when building. I'll send them an email directly to > see what > > > they say. There may be some Win32-specific way of configuring a > script for > > > dynamic loading of perl modules which isn't needed in other > environments. > > > > > > There was also this previous email on bioperl-l: > > > > > > http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html > > > > > > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if > changing > > > the modules is a solution until I figure out why he made the changes. > They > > > seem mainly geared towards getting load_seqdatabase to work with > MsSQL, but > > > if he got it to work on Windows, then he may be onto something. The > > > modified Bio* modules can be found at: > > > > > > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows > > > > > > I'll check them out to see if they work out and see what specific > > > modifications he made (they're not detailed). > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org > > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris > Fields > > > Sent: Friday, January 06, 2006 1:28 PM > > > To: 'Hilmar Lapp' > > > Cc: bioperl-l@portal.open-bio.org > > > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl > > > > > > I'll try installing bioperl-db using Cygwin. I know that I can > connect to > > > the native Windows mysql database from inside cygwin, so perhaps this > will > > > do as a short term workaround. I'll also try using a different native > win32 > > > Perl version (maybe 5.6) and look into the dynamic loading issue. I > know > > > that the AS Perl has given errors like this before and not had > problems (I > > > think it was also cranky with older versions bioperl), but this one is > > > pretty serious. > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > -----Original Message----- > > > From: Hilmar Lapp [mailto:hlapp@gmx.net] > > > Sent: Friday, January 06, 2006 12:02 PM > > > To: Chris Fields > > > Cc: bioperl-l@portal.open-bio.org > > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > > > > > > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > > > > > > > Hilmar, > > > > > > > > Did this ever get resolved? I tried to reinstall a biosql database > > > > using > > > > bioperl-db and got the same problems. I'll list out everything I > ran > > > > into > > > > and what I pan on trying, as it's been a long time since I've tried > > > > this. > > > > > > > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL > > > > 4.1.14. > > > > Using nmake and installing worked fine. Loading the biosql schema > and > > > > loading taxonomy info also worked fine, although I had to manually > > > > untar the > > > > taxonomy archive so load_ncbi_taxonomy.pl could find the files > (stupid > > > > windows). However, this is what happens when using > > > > load_seqdatabase.pl: > > > > > > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser > root > > > > NP_249092.gpt > > > > Loading NP_249092.gpt ... > > > > Undefined subroutine &Bio::Root::Root::debug called at > > > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, > > > > > > > > line 65. > > > > > > > > If I removed all args except the sequence file, it gives the same > > > > response, > > > > which means it happens before the connection is made to the > database: > > > > > > > > > > This happens indeed before a connection is made because it happens at > > > the point it tries to dynamically load the BioSQL driver for the > > > adaptor: > > > > > > $self->debug("attempting to load driver for adaptor class > > > $class\n"); > > > > > > The BioSQL driver is loaded before the DBD driver is loaded. > > > > > > The module in which this happens (i.e., the persistence adaptor) has > > > been loaded dynamically as well. > > > > > > Bio::Root::Root is in the 'use' statements, and the debug() method > > > clearly exists. I'm at a loss as to why perl complains on certain > > > Windows platforms. If somebody can tell me what, if anything, can be > > > done to make this work on those platforms too I'll be glad to > implement > > > it. > > > > > > > [...] > > > > Here's the error messages from that first test (warning it's very > > > > messy): > > > > > > > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, > > > > 'bl > > > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t > > > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t > t\08genbank.t > > > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t > t\13remove.t > > > > t\14query.t t\15cluster.t > > > > t\01dbadaptor.....ok 1/19Subroutine new redefined at > > > > [...] > > > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line > > > > 356. > > > > > > So obviously it is there, right? So why doesn't perl see it a minute > > > later? > > > > > > > [...] > > > > I'll end with that. At this moment, I can't see it working with the > > > > current > > > > setup. I was using perl 5.8 with the old setup but I upgraded mysql > > > > at some > > > > point when working with gbrowse (I can't remember what the old > version > > > > was); > > > > I'll try upgrading to the newest ActiveState version to see what > > > > happens. > > > > Could it be the MySQL version? > > > > > > I don't think it has anything to do with the MySQL version, or the DBD > > > driver for that matter. Instead, it looks like on issue with dynamic > > > loading of perl modules on your particular platform. > > > > > > -hilmar > > > > > > > > > > > Christopher Fields > > > > Postdoctoral Researcher - Switzer Lab > > > > Dept. of Biochemistry > > > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > -- > > > ------------------------------------------------------------- > > > Hilmar Lapp email: lapp at gnf.org > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > ------------------------------------------------------------- > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > -- > > ---------------------------------------------------------- > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > ---------------------------------------------------------- > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Fri Jan 13 14:52:11 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri Jan 13 15:17:01 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: <000501c6186a$5294eed0$15327e82@pyrimidine> References: <000501c6186a$5294eed0$15327e82@pyrimidine> Message-ID: On 1/13/06, Chris Fields wrote: > [...] > * Running the example scripts (exceptions[1-4].pl) with or w/o Error.pm > showed no difference (I checked with diff). Given the below you probably failed to uninstall Error.pm, or not installing it in the first place doesn't matter because it's there already. > * Changing "throw $class" to "Error::throw $class" in Root.pm didn't do > anything, which is strange (I did this with and w/o Error.pm installed). I > thought at this point, that Activestate may have Error.pm as part of their > core modules, but it isn't included anywhere in the Perl directory tree or > under PERL5LIB. It also isn't listed as CORE in their modules list > (http://ppm.activestate.com/BuildStatus/5.8-E.html); the core modules are > usually under '/lib' instead of '/site/lib'. So why would "Error::throw" > even work? I also tried 'perl -e "require Error" and didn't get errors, so > it has to be around somewhere. Right. I usually do $ perl -MYet::Another::Module to convince myself that Yes::Another::Module really is not accessible to the interpreter. And if I do that with Error on my OSX box I do receive an error about perl not finding the Error module anywhere. > * Even stranger, when changing "throw $class" to "Error::throw $class" in > Root.pm, load_seqdatabase.pl works fine, just like when "throw $class" is > changed to "throw $class,". Oi!! Now, I can't imagine that using Error::throw $class would not die immediately if Error.pm is not installed. You can check that quickly by mistyping the module name (like Errror::throw). So unless there's some deep magic going on then using Error::throw instead of just throw() is not an option I'm afraid. I guess the solution needs to be adding the comma. I can't imagine why this would break on non-Windows systems, but obviously some testing is in order. -hilmar > * Changing load_seqdatabase.pl to include the line "INIT { > $DONT_USE_ERROR=1; }" also didn't do anything; only changes to Root.pm made > a difference. > > Lesson: Windows is flaky. I think that much of this behavior is just > ActivePerl-specific, which may be why it hasn't been seen elsewhere. I > don't know much about ActivePerl and exception handling, so I may delve into > it a bit more to see if there is something else there. I also dropped > Activestate an email asking about Error.pm and their core distribution. > > So, the question is, should Root.pm be changed in bioperl-live? Obviously > this would need to be well tested out before committing any changes. I > could try it out on Mac OS X. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > bounces@portal.open-bio.org] On Behalf Of Steve Chervitz > > Sent: Friday, January 13, 2006 4:26 AM > > To: Hilmar Lapp > > Cc: Chris Fields; bioperl-l@portal.open-bio.org; Steve Chervitz > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > looks like the trouble is when Bio::Root::Root::throw() tries to call > > Error::throw(). Perhaps there is some windows-specific problem with > > Error.pm? Can't say I've seen this before since I don't use perl on > > windows. > > > > Some things to try, in this order: > > > > * Verify that Error.pm is installed for perl on your system. > > * Try running t/Exception.t and > > the examples/root/exceptions[1-4].pl scripts and see if they > > produce the expected behavior. > > * Try changing the 'throw $class ...' statements in Root.pm to > > 'Error::throw $class ...' > > * If Error.pm seems to be installed but isn't working right, either > > uninstall it or get in the habit of putting this line in your main > > scripts: INIT { $DONT_USE_ERROR=1; } > > > > Steve > > > > On Wed, 11 Jan 2006, Hilmar Lapp wrote: > > > > > Date: Wed, 11 Jan 2006 15:12:45 -0800 > > > From: Hilmar Lapp > > > To: Chris Fields , > > > Steve Chervitz > > > Cc: bioperl-l@portal.open-bio.org > > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > > > Interesting. That posting didn't receive much attention did it. So he > > states: > > > > > > > > > The script failed on throw() in loading Bio/Root/Root.pm on Windows. > > > The problem lines are those "throw $class (...". After I put comma > > > after $class as "throw $class, (...", the BioSQL tests and load scripts > > > are succeeded > > > > > > > > > Can anyone of those who wrote the Root exception and warning code > > > comment? Maybe Steve? > > > > > > -hilmar > > > > > > On 1/11/06, Chris Fields wrote: > > > > Hilmar, > > > > > > > > As an update on what's going on: > > > > > > > > I've run into a few problems with load_seqdatabase.pl and bioperl-db > > on > > > > cygwin which I'll try to hash through this week; I'll post if I can't > > figure > > > > it out soon. It's not as buggy as trying to run it using the latest > > > > ActivePerl on WinXP, but it still has issues. > > > > > > > > I'm also looking through the ActiveState documentation for the latest > > > > version of perl they have (5.8.7), which I am running. AFAIK, they > > enable > > > > dynamic loading when building. I'll send them an email directly to > > see what > > > > they say. There may be some Win32-specific way of configuring a > > script for > > > > dynamic loading of perl modules which isn't needed in other > > environments. > > > > > > > > There was also this previous email on bioperl-l: > > > > > > > > http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html > > > > > > > > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if > > changing > > > > the modules is a solution until I figure out why he made the changes. > > They > > > > seem mainly geared towards getting load_seqdatabase to work with > > MsSQL, but > > > > if he got it to work on Windows, then he may be onto something. The > > > > modified Bio* modules can be found at: > > > > > > > > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows > > > > > > > > I'll check them out to see if they work out and see what specific > > > > modifications he made (they're not detailed). > > > > > > > > Christopher Fields > > > > Postdoctoral Researcher - Switzer Lab > > > > Dept. of Biochemistry > > > > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > > > From: bioperl-l-bounces@portal.open-bio.org > > > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris > > Fields > > > > Sent: Friday, January 06, 2006 1:28 PM > > > > To: 'Hilmar Lapp' > > > > Cc: bioperl-l@portal.open-bio.org > > > > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl > > > > > > > > I'll try installing bioperl-db using Cygwin. I know that I can > > connect to > > > > the native Windows mysql database from inside cygwin, so perhaps this > > will > > > > do as a short term workaround. I'll also try using a different native > > win32 > > > > Perl version (maybe 5.6) and look into the dynamic loading issue. I > > know > > > > that the AS Perl has given errors like this before and not had > > problems (I > > > > think it was also cranky with older versions bioperl), but this one is > > > > pretty serious. > > > > > > > > Christopher Fields > > > > Postdoctoral Researcher - Switzer Lab > > > > Dept. of Biochemistry > > > > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > > > From: Hilmar Lapp [mailto:hlapp@gmx.net] > > > > Sent: Friday, January 06, 2006 12:02 PM > > > > To: Chris Fields > > > > Cc: bioperl-l@portal.open-bio.org > > > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > > > > > > > > > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > > > > > > > > > Hilmar, > > > > > > > > > > Did this ever get resolved? I tried to reinstall a biosql database > > > > > using > > > > > bioperl-db and got the same problems. I'll list out everything I > > ran > > > > > into > > > > > and what I pan on trying, as it's been a long time since I've tried > > > > > this. > > > > > > > > > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL > > > > > 4.1.14. > > > > > Using nmake and installing worked fine. Loading the biosql schema > > and > > > > > loading taxonomy info also worked fine, although I had to manually > > > > > untar the > > > > > taxonomy archive so load_ncbi_taxonomy.pl could find the files > > (stupid > > > > > windows). However, this is what happens when using > > > > > load_seqdatabase.pl: > > > > > > > > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser > > root > > > > > NP_249092.gpt > > > > > Loading NP_249092.gpt ... > > > > > Undefined subroutine &Bio::Root::Root::debug called at > > > > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537, > > > > > > > > > > line 65. > > > > > > > > > > If I removed all args except the sequence file, it gives the same > > > > > response, > > > > > which means it happens before the connection is made to the > > database: > > > > > > > > > > > > > This happens indeed before a connection is made because it happens at > > > > the point it tries to dynamically load the BioSQL driver for the > > > > adaptor: > > > > > > > > $self->debug("attempting to load driver for adaptor class > > > > $class\n"); > > > > > > > > The BioSQL driver is loaded before the DBD driver is loaded. > > > > > > > > The module in which this happens (i.e., the persistence adaptor) has > > > > been loaded dynamically as well. > > > > > > > > Bio::Root::Root is in the 'use' statements, and the debug() method > > > > clearly exists. I'm at a loss as to why perl complains on certain > > > > Windows platforms. If somebody can tell me what, if anything, can be > > > > done to make this work on those platforms too I'll be glad to > > implement > > > > it. > > > > > > > > > [...] > > > > > Here's the error messages from that first test (warning it's very > > > > > messy): > > > > > > > > > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, > > > > > 'bl > > > > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t > > > > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t > > t\08genbank.t > > > > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t > > t\13remove.t > > > > > t\14query.t t\15cluster.t > > > > > t\01dbadaptor.....ok 1/19Subroutine new redefined at > > > > > [...] > > > > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line > > > > > 356. > > > > > > > > So obviously it is there, right? So why doesn't perl see it a minute > > > > later? > > > > > > > > > [...] > > > > > I'll end with that. At this moment, I can't see it working with the > > > > > current > > > > > setup. I was using perl 5.8 with the old setup but I upgraded mysql > > > > > at some > > > > > point when working with gbrowse (I can't remember what the old > > version > > > > > was); > > > > > I'll try upgrading to the newest ActiveState version to see what > > > > > happens. > > > > > Could it be the MySQL version? > > > > > > > > I don't think it has anything to do with the MySQL version, or the DBD > > > > driver for that matter. Instead, it looks like on issue with dynamic > > > > loading of perl modules on your particular platform. > > > > > > > > -hilmar > > > > > > > > > > > > > > Christopher Fields > > > > > Postdoctoral Researcher - Switzer Lab > > > > > Dept. of Biochemistry > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@portal.open-bio.org > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > -- > > > > ------------------------------------------------------------- > > > > Hilmar Lapp email: lapp at gnf.org > > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > > ------------------------------------------------------------- > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > -- > > > ---------------------------------------------------------- > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > ---------------------------------------------------------- > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From cjfields at uiuc.edu Fri Jan 13 15:31:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri Jan 13 17:09:35 2006 Subject: [Bioperl-l] error running load_seqdatabase.pl In-Reply-To: Message-ID: <000001c61880$57814d10$15327e82@pyrimidine> Sorry about that. I retried 'perl -e "require Error;' and various incarnations of it. Without Error.pm: C:\Perl\test\bioperl-db>perl -e "require Error;" Can't locate Error.pm in @INC (@INC contains: C:\Perl C:/Perl/lib C:/Perl/site/lib .) at -e line 1. This is the interesting bit; I then installed Error.pm. I tried out the following: C:\Perl\test\bioperl-db>perl -e "require Error; Error::throw;" C:\Perl\test\bioperl-db>perl -e "require Error; Error::throw();" Can't call method "new" on an undefined value at C:/Perl/site/lib/Error.pm line 148. It ignored the first run (without parentheses). Then I tried this: C:\Perl\test\bioperl-db>perl -e "require Error; Errror::throw" and got no errors. Maybe it doesn't recognize Error::throw (or Errror:throw) as a subroutine for some reason unless it has parentheses. This makes me think something else is going on. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: drycafe@gmail.com [mailto:drycafe@gmail.com] On Behalf Of Hilmar > Lapp > Sent: Friday, January 13, 2006 1:52 PM > To: Chris Fields > Cc: Steve Chervitz; bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > On 1/13/06, Chris Fields wrote: > > [...] > > * Running the example scripts (exceptions[1-4].pl) with or w/o Error.pm > > showed no difference (I checked with diff). > > Given the below you probably failed to uninstall Error.pm, or not > installing it in the first place doesn't matter because it's there > already. > > > * Changing "throw $class" to "Error::throw $class" in Root.pm didn't do > > anything, which is strange (I did this with and w/o Error.pm installed). > I > > thought at this point, that Activestate may have Error.pm as part of > their > > core modules, but it isn't included anywhere in the Perl directory tree > or > > under PERL5LIB. It also isn't listed as CORE in their modules list > > (http://ppm.activestate.com/BuildStatus/5.8-E.html); the core modules > are > > usually under '/lib' instead of '/site/lib'. So why would > "Error::throw" > > even work? I also tried 'perl -e "require Error" and didn't get errors, > so > > it has to be around somewhere. > > Right. I usually do > > $ perl -MYet::Another::Module > > to convince myself that Yes::Another::Module really is not accessible > to the interpreter. And if I do that with Error on my OSX box I do > receive an error about perl not finding the Error module anywhere. > > > * Even stranger, when changing "throw $class" to "Error::throw $class" > in > > Root.pm, load_seqdatabase.pl works fine, just like when "throw $class" > is > > changed to "throw $class,". Oi!! > > > Now, I can't imagine that using Error::throw $class would > not die immediately if Error.pm is not installed. You can check that > quickly by mistyping the module name (like Errror::throw). So unless > there's some deep magic going on then using Error::throw instead of > just throw() is not an option I'm afraid. > > I guess the solution needs to be adding the comma. I can't imagine why > this would break on non-Windows systems, but obviously some testing is > in order. > > -hilmar > > > * Changing load_seqdatabase.pl to include the line "INIT { > > $DONT_USE_ERROR=1; }" also didn't do anything; only changes to Root.pm > made > > a difference. > > > > Lesson: Windows is flaky. I think that much of this behavior is just > > ActivePerl-specific, which may be why it hasn't been seen elsewhere. I > > don't know much about ActivePerl and exception handling, so I may delve > into > > it a bit more to see if there is something else there. I also dropped > > Activestate an email asking about Error.pm and their core distribution. > > > > So, the question is, should Root.pm be changed in bioperl-live? > Obviously > > this would need to be well tested out before committing any changes. I > > could try it out on Mac OS X. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > > bounces@portal.open-bio.org] On Behalf Of Steve Chervitz > > > Sent: Friday, January 13, 2006 4:26 AM > > > To: Hilmar Lapp > > > Cc: Chris Fields; bioperl-l@portal.open-bio.org; Steve Chervitz > > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > > > looks like the trouble is when Bio::Root::Root::throw() tries to call > > > Error::throw(). Perhaps there is some windows-specific problem with > > > Error.pm? Can't say I've seen this before since I don't use perl on > > > windows. > > > > > > Some things to try, in this order: > > > > > > * Verify that Error.pm is installed for perl on your system. > > > * Try running t/Exception.t and > > > the examples/root/exceptions[1-4].pl scripts and see if they > > > produce the expected behavior. > > > * Try changing the 'throw $class ...' statements in Root.pm to > > > 'Error::throw $class ...' > > > * If Error.pm seems to be installed but isn't working right, either > > > uninstall it or get in the habit of putting this line in your main > > > scripts: INIT { $DONT_USE_ERROR=1; } > > > > > > Steve > > > > > > On Wed, 11 Jan 2006, Hilmar Lapp wrote: > > > > > > > Date: Wed, 11 Jan 2006 15:12:45 -0800 > > > > From: Hilmar Lapp > > > > To: Chris Fields , > > > > Steve Chervitz > > > > Cc: bioperl-l@portal.open-bio.org > > > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > > > > > Interesting. That posting didn't receive much attention did it. So > he > > > states: > > > > > > > > > > > > The script failed on throw() in loading Bio/Root/Root.pm on Windows. > > > > The problem lines are those "throw $class (...". After I put comma > > > > after $class as "throw $class, (...", the BioSQL tests and load > scripts > > > > are succeeded > > > > > > > > > > > > Can anyone of those who wrote the Root exception and warning code > > > > comment? Maybe Steve? > > > > > > > > -hilmar > > > > > > > > On 1/11/06, Chris Fields wrote: > > > > > Hilmar, > > > > > > > > > > As an update on what's going on: > > > > > > > > > > I've run into a few problems with load_seqdatabase.pl and bioperl- > db > > > on > > > > > cygwin which I'll try to hash through this week; I'll post if I > can't > > > figure > > > > > it out soon. It's not as buggy as trying to run it using the > latest > > > > > ActivePerl on WinXP, but it still has issues. > > > > > > > > > > I'm also looking through the ActiveState documentation for the > latest > > > > > version of perl they have (5.8.7), which I am running. AFAIK, > they > > > enable > > > > > dynamic loading when building. I'll send them an email directly > to > > > see what > > > > > they say. There may be some Win32-specific way of configuring a > > > script for > > > > > dynamic loading of perl modules which isn't needed in other > > > environments. > > > > > > > > > > There was also this previous email on bioperl-l: > > > > > > > > > > http://portal.open-bio.org/pipermail/bioperl-l/2005- > May/018937.html > > > > > > > > > > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if > > > changing > > > > > the modules is a solution until I figure out why he made the > changes. > > > They > > > > > seem mainly geared towards getting load_seqdatabase to work with > > > MsSQL, but > > > > > if he got it to work on Windows, then he may be onto something. > The > > > > > modified Bio* modules can be found at: > > > > > > > > > > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows > > > > > > > > > > I'll check them out to see if they work out and see what specific > > > > > modifications he made (they're not detailed). > > > > > > > > > > Christopher Fields > > > > > Postdoctoral Researcher - Switzer Lab > > > > > Dept. of Biochemistry > > > > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > > > From: bioperl-l-bounces@portal.open-bio.org > > > > > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Chris > > > Fields > > > > > Sent: Friday, January 06, 2006 1:28 PM > > > > > To: 'Hilmar Lapp' > > > > > Cc: bioperl-l@portal.open-bio.org > > > > > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl > > > > > > > > > > I'll try installing bioperl-db using Cygwin. I know that I can > > > connect to > > > > > the native Windows mysql database from inside cygwin, so perhaps > this > > > will > > > > > do as a short term workaround. I'll also try using a different > native > > > win32 > > > > > Perl version (maybe 5.6) and look into the dynamic loading issue. > I > > > know > > > > > that the AS Perl has given errors like this before and not had > > > problems (I > > > > > think it was also cranky with older versions bioperl), but this > one is > > > > > pretty serious. > > > > > > > > > > Christopher Fields > > > > > Postdoctoral Researcher - Switzer Lab > > > > > Dept. of Biochemistry > > > > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > > > From: Hilmar Lapp [mailto:hlapp@gmx.net] > > > > > Sent: Friday, January 06, 2006 12:02 PM > > > > > To: Chris Fields > > > > > Cc: bioperl-l@portal.open-bio.org > > > > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl > > > > > > > > > > > > > > > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote: > > > > > > > > > > > Hilmar, > > > > > > > > > > > > Did this ever get resolved? I tried to reinstall a biosql > database > > > > > > using > > > > > > bioperl-db and got the same problems. I'll list out everything > I > > > ran > > > > > > into > > > > > > and what I pan on trying, as it's been a long time since I've > tried > > > > > > this. > > > > > > > > > > > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and > MySQL > > > > > > 4.1.14. > > > > > > Using nmake and installing worked fine. Loading the biosql > schema > > > and > > > > > > loading taxonomy info also worked fine, although I had to > manually > > > > > > untar the > > > > > > taxonomy archive so load_ncbi_taxonomy.pl could find the files > > > (stupid > > > > > > windows). However, this is what happens when using > > > > > > load_seqdatabase.pl: > > > > > > > > > > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase - > dbuser > > > root > > > > > > NP_249092.gpt > > > > > > Loading NP_249092.gpt ... > > > > > > Undefined subroutine &Bio::Root::Root::debug called at > > > > > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line > 1537, > > > > > > > > > > > > line 65. > > > > > > > > > > > > If I removed all args except the sequence file, it gives the > same > > > > > > response, > > > > > > which means it happens before the connection is made to the > > > database: > > > > > > > > > > > > > > > > This happens indeed before a connection is made because it happens > at > > > > > the point it tries to dynamically load the BioSQL driver for the > > > > > adaptor: > > > > > > > > > > $self->debug("attempting to load driver for adaptor class > > > > > $class\n"); > > > > > > > > > > The BioSQL driver is loaded before the DBD driver is loaded. > > > > > > > > > > The module in which this happens (i.e., the persistence adaptor) > has > > > > > been loaded dynamically as well. > > > > > > > > > > Bio::Root::Root is in the 'use' statements, and the debug() method > > > > > clearly exists. I'm at a loss as to why perl complains on certain > > > > > Windows platforms. If somebody can tell me what, if anything, can > be > > > > > done to make this work on those platforms too I'll be glad to > > > implement > > > > > it. > > > > > > > > > > > [...] > > > > > > Here's the error messages from that first test (warning it's > very > > > > > > messy): > > > > > > > > > > > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" > "test_harness(0, > > > > > > 'bl > > > > > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t > t\03simpleseq.t > > > > > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t > > > t\08genbank.t > > > > > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t > > > t\13remove.t > > > > > > t\14query.t t\15cluster.t > > > > > > t\01dbadaptor.....ok 1/19Subroutine new redefined at > > > > > > [...] > > > > > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm > line > > > > > > 356. > > > > > > > > > > So obviously it is there, right? So why doesn't perl see it a > minute > > > > > later? > > > > > > > > > > > [...] > > > > > > I'll end with that. At this moment, I can't see it working with > the > > > > > > current > > > > > > setup. I was using perl 5.8 with the old setup but I upgraded > mysql > > > > > > at some > > > > > > point when working with gbrowse (I can't remember what the old > > > version > > > > > > was); > > > > > > I'll try upgrading to the newest ActiveState version to see what > > > > > > happens. > > > > > > Could it be the MySQL version? > > > > > > > > > > I don't think it has anything to do with the MySQL version, or the > DBD > > > > > driver for that matter. Instead, it looks like on issue with > dynamic > > > > > loading of perl modules on your particular platform. > > > > > > > > > > -hilmar > > > > > > > > > > > > > > > > > Christopher Fields > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > Dept. of Biochemistry > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _______________________________________________ > > > > > > Bioperl-l mailing list > > > > > > Bioperl-l@portal.open-bio.org > > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > -- > > > > > ------------------------------------------------------------- > > > > > Hilmar Lapp email: lapp at gnf.org > > > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > > > ------------------------------------------------------------- > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@portal.open-bio.org > > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > -- > > > > ---------------------------------------------------------- > > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > > > > ---------------------------------------------------------- > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- From anst at kvl.dk Sat Jan 14 11:50:02 2006 From: anst at kvl.dk (Anders Stegmann) Date: Sat Jan 14 12:05:57 2006 Subject: [Bioperl-l] BIO::SearchIO HOWTO mistake? Message-ID: <43C939CA0200009B00000429@gwia.kvl.dk> Hi! According to the Bio::SearchIO HOWTO this, $hsp->seq_inds('hit', 'conserved'); should fetch ONLY the conserved residues in an alignment (not those identical). When I use: sub subject_seq_alignment_conserved_residues { my ($hsp_obj) = @_; my %subject_conserved_hash = (); my @subject_string = split , $$hsp_obj->hit_string; foreach ($$hsp_obj->seq_inds('hit', 'conserved')) { $subject_conserved_hash{$_} = $subject_string[$_ -1]; } return %subject_conserved_hash; } I get all residues in the alignment inclusive those that are identical! What's wrong? If the Bio::SearchIO HOWTO is wrong about this, is there an easy way to fetch only the conserved residues? Anders. From u4075723 at anu.edu.au Sat Jan 14 02:37:27 2006 From: u4075723 at anu.edu.au (Nagesh Chakka) Date: Sat Jan 14 17:13:28 2006 Subject: [Bioperl-l] Problem with Webblast.pm Message-ID: <200601141837.27204.u4075723@anu.edu.au> Hi, I am having problem in using the remote blast module of bioperl. I have installed the latest version of Bioperl (1.5.1) and when I am running the run_remote_blast.pl I am getting the following error that it can not locate Webblast.pm module. Can't locate Bio/Tools/Blast/Run/Webblast.pm in @INC (@INC contains: . .. /home/nagesh/progs/lib/perl5/i686-linux /home/nagesh/progs/lib/perl5 /home/nagesh/progs/lib/perl5//i686-linux /home/nagesh/progs/lib/perl5/ /usr/local/lib/perl5/5.8.6/i686-linux /usr/local/lib/perl5/5.8.6 /usr/local/lib/perl5/site_perl/5.8.6/i686-linux /usr/local/lib/perl5/site_perl/5.8.6 /usr/local/lib/perl5/site_perl) at /home/nagesh/progs/lib/perl5/Bio/Tools/Blast.pm line 1303, line 1. When I had looked at the directory (Bio/Tools/Blast/Run/), I could not find this module. The following webpage says that it is available with the core package. http://annocpan.org/~BIRNEY/bioperl-0.04.3/Bio/Tools/Blast/Run/Webblast.pm Can anyone please advice what would have gone wrong and how can I get it going? The testing of bioperl installation was ok. Hoping for some answer. Thanks Nagesh From chen_li3 at yahoo.com Sat Jan 14 23:57:33 2006 From: chen_li3 at yahoo.com (chen li) Date: Sun Jan 15 00:00:57 2006 Subject: [Bioperl-l] parser for primer3 output Message-ID: <20060115045733.96498.qmail@web36810.mail.mud.yahoo.com> Hi all, After batch-design of PCR primers with Bio::Tools::Run::Primer3 I get the results in a file called "temp.out". I want to pull out each pair of primers and put them into excel format file. I just want to know if such a module/parser is already available or I need to write some codes to parse the output. Thanks, Li __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From jason.stajich at duke.edu Sun Jan 15 11:08:36 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Jan 15 11:04:58 2006 Subject: [Bioperl-l] parser for primer3 output In-Reply-To: <20060115045733.96498.qmail@web36810.mail.mud.yahoo.com> References: <20060115045733.96498.qmail@web36810.mail.mud.yahoo.com> Message-ID: <52DFE7E1-9826-46E2-9B97-62F735264FFC@duke.edu> Bio::Tools::Primer3 ? You get one of these objects back from run() method in Bio::Tools::Run::Primer3 without having to re-open the result file. -jason On Jan 14, 2006, at 11:57 PM, chen li wrote: > Hi all, > > After batch-design of PCR primers with > Bio::Tools::Run::Primer3 I get the results in a file > called "temp.out". I want to pull out each pair of > primers and put them into excel format file. I just > want to know if such a module/parser is already > available or I need to write some codes to parse the > output. > > Thanks, > > Li > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Sun Jan 15 10:59:36 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun Jan 15 11:51:53 2006 Subject: [Bioperl-l] BIO::SearchIO HOWTO mistake? In-Reply-To: <43C939CA0200009B00000429@gwia.kvl.dk> References: <43C939CA0200009B00000429@gwia.kvl.dk> Message-ID: <8710E96A-DE7C-45DD-A8A8-79568D4D65F2@duke.edu> If you read the documentation for the seq_inds method you'll see the following options, I think you want conserved-not-identical. Title : seq_inds Purpose : Get a list of residue positions (indices) for all identical : or conserved residues in the query or sbjct sequence. Example : @s_ind = $hsp->seq_inds('query', 'identical'); : @h_ind = $hsp->seq_inds('hit', 'conserved'); : @h_ind = $hsp->seq_inds('hit', 'conserved-not- identical'); : @h_ind = $hsp->seq_inds('hit', 'conserved', 1); Returns : List of integers : May include ranges if collapse is true. Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = query) : ('sbjct' is synonymous with 'hit') : class = 'identical' or 'conserved' or 'nomatch' or 'gap' : (default = identical) : (can be shortened to 'id' or 'cons') : or 'conserved-not-identical' : collapse = boolean, if true, consecutive positions are merged : using a range notation, e.g., "1 2 3 4 5 7 9 10 11" : collapses to "1-5 7 9-11". This is useful for : consolidating long lists. Default = no collapse. Throws : n/a. Comments : On Jan 14, 2006, at 11:50 AM, Anders Stegmann wrote: > Hi! > > According to the Bio::SearchIO HOWTO this, > > $hsp->seq_inds('hit', 'conserved'); > > should fetch ONLY the conserved residues in an alignment (not those > identical). > > When I use: > > sub subject_seq_alignment_conserved_residues { > > my ($hsp_obj) = @_; > my %subject_conserved_hash = (); > > my @subject_string = split , $$hsp_obj->hit_string; > > foreach ($$hsp_obj->seq_inds('hit', 'conserved')) { > > $subject_conserved_hash{$_} = $subject_string[$_ -1]; > > } > > return %subject_conserved_hash; > > } > > > > I get all residues in the alignment inclusive those that are > identical! > > What's wrong? > > If the Bio::SearchIO HOWTO is wrong about this, is there an easy > way to > fetch only the conserved residues? > > Anders. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From bmoore at genetics.utah.edu Sun Jan 15 15:36:15 2006 From: bmoore at genetics.utah.edu (Barry Moore) Date: Sun Jan 15 15:32:04 2006 Subject: [Bioperl-l] Problem with Webblast.pm Message-ID: Nagesh, Where did you get run_remote_blast.pl from? I'm not sure, but I think that is an older script and I don't think Webblast.pm is part of the bioperl distribution anymore. I don't find it on my computer either. Try using bp_remote_blast.pl which you should find in the scripts directory of you bioperl installation. It uses RemoteBlast.pm which is the current package for doing Blast over the web. Barry > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Nagesh Chakka > Sent: Saturday, January 14, 2006 12:37 AM > To: bioperl-l@bioperl.org > Cc: Nagesh Chakka; babu.kannappan@anu.edu.au > Subject: [Bioperl-l] Problem with Webblast.pm > > Hi, > I am having problem in using the remote blast module of bioperl. I have > installed the latest version of Bioperl (1.5.1) and when I am running the > run_remote_blast.pl I am getting the following error that it can not > locate > Webblast.pm module. > Can't locate Bio/Tools/Blast/Run/Webblast.pm in @INC (@INC > contains: . .. /home/nagesh/progs/lib/perl5/i686-linux > /home/nagesh/progs/lib/perl5 /home/nagesh/progs/lib/perl5//i686-linux > /home/nagesh/progs/lib/perl5/ /usr/local/lib/perl5/5.8.6/i686-linux > /usr/local/lib/perl5/5.8.6 /usr/local/lib/perl5/site_perl/5.8.6/i686-linux > /usr/local/lib/perl5/site_perl/5.8.6 /usr/local/lib/perl5/site_perl) > at /home/nagesh/progs/lib/perl5/Bio/Tools/Blast.pm line 1303, line > 1. > > > When I had looked at the directory (Bio/Tools/Blast/Run/), I could not > find > this module. The following webpage says that it is available with the core > package. > http://annocpan.org/~BIRNEY/bioperl-0.04.3/Bio/Tools/Blast/Run/Webblast. pm > Can anyone please advice what would have gone wrong and how can I get it > going? The testing of bioperl installation was ok. > Hoping for some answer. > Thanks > Nagesh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From nagesh.chakka at anu.edu.au Mon Jan 16 16:56:45 2006 From: nagesh.chakka at anu.edu.au (Nagesh Chakka) Date: Mon Jan 16 17:13:58 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm Message-ID: <200601170856.45627.nagesh.chakka@anu.edu.au> Hi All, I was trying to setup a system to perform a remote blast on regular basis. I thought this could be best achieved by using BioPerl module and came across RemoteBlast.pm I had modified the sample script "bp_remote_blast.pl" which takes a file containing single FASTA sequence as an input. Also I wanted the blast report to be saved in a file for latter use and modified the code as follows I am using the latest version of Bioperl (1.5) on a Fedora platform. ####################################################################### print "$Bio::Root::Version::VERSION\n"; use Bio::Tools::Run::RemoteBlast; use strict; my $prog = 'blastp'; my $db = 'swissprot'; my $e_val= '1e-10'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; #remove a parameter delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; #$v is just to turn on and off the messages my $r = $factory->submit_blast('blastInput.txt'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { print "RID $rid\n"; $factory->save_output('temp.out'); $factory->remove_rid($rid); } } } ################################################################################# This script prints the RID and terminates immediately. Obviously the output file created is empty as the program did not wait for getting the blast results from the RID. Is there something I am doing wrong and what can I do for the program to wait until the results are ready to be printed to the output file. I could not get much information from the documentation and have no prior experience with Bioperl. Thanks very much for your attention. Regards Nageshbi From hubert.prielinger at gmx.at Mon Jan 16 16:44:09 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon Jan 16 17:37:38 2006 Subject: [Bioperl-l] parse Blast Output and Composition Based Statistics parameter In-Reply-To: <43CC05E1.5070503@gmx.at> References: <43C6ECDC.7050308@gmx.at> <2CF48095-DF0E-4BB5-AAB8-3B8DBC813E76@duke.edu> <43CC05E1.5070503@gmx.at> Message-ID: <43CC13A9.3010209@gmx.at> Hubert Prielinger wrote: > Jason Stajich wrote: > >> (please don't try and post to bioperl-announce, it is not for >> questions.) >> >> On Jan 12, 2006, at 6:57 PM, Hubert Prielinger wrote: >> >>> Hello, >>> I want to know, if there is a possibility to get from a Blast >>> Outputfile the whole Sequence of a protein not only the best local >>> alignment... >>> for example: >>> >> No. The parser can only return to you what is in the report file... >> use Bio::DB::GenPept to retrieve the sequence via the web or >> (recommended) use a locally indexed sequence database like >> Bio::DB::Fasta >> >>> >ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica >>> cultivar-group)] >>> dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica >>> cultivar-group)] >>> Length=95 >>> >>> Score = 24.1 bits (47), Expect = 493 >>> Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) >>> >>> Query 2 KKRRRWW 8 >>> K+RRRWW >>> Sbjct 87 KRRRRWW 93 >>> >>> and now, if I parse the file, I want to get the whole Sequence of >>> this hypothetical protein....is that possible with hsp for example, >>> or any other way.... >>> >>> my second question is: >>> I do my blast search with bioperl and the remoteblast >>> module.....each parameter is working very well, except the >>> composition based statistics parameter.... >>> it looks like that: >>> >>> my $factory = $Bio::Tools::Run::RemoteBlast::HEADER >>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>> >> uh no that is not how you would do it. >> You can make it the default for any factories you use in the script >> by doing this >> >>> $Bio::Tools::Run::RemoteBlast::HEADER >>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >> >> >> then >> $factory = Bio::Tools::Run::RemoteBlast->new(); >> >> >> =OR= >> Once you have a factory object you can set the parameter explicitly: >> $factory->submit_parameter('COMPOSITION_BASED_STATISTICS', 'yes'); >> >>> it should work like that, but it doesn't.... >>> >>> Thanks for your help in advance...... >>> >>> regards >>> Hubert >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > Hi Jason, I have tried everything that you suggested, but the Composition Based Statistic parameter isn't still working, every other parameter works using e.g $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000'; thanks in advance Hubert From hubert.prielinger at gmx.at Mon Jan 16 16:54:03 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon Jan 16 17:47:31 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm In-Reply-To: <200601170856.45627.nagesh.chakka@anu.edu.au> References: <200601170856.45627.nagesh.chakka@anu.edu.au> Message-ID: <43CC15FB.4000008@gmx.at> Nagesh Chakka wrote: >Hi All, >I was trying to setup a system to perform a remote blast on regular basis. I >thought this could be best achieved by using BioPerl module and came across >RemoteBlast.pm >I had modified the sample script "bp_remote_blast.pl" which takes a file >containing single FASTA sequence as an input. Also I wanted the blast report >to be saved in a file for latter use and >modified the code as follows >I am using the latest version of Bioperl (1.5) on a Fedora platform. >####################################################################### >print "$Bio::Root::Version::VERSION\n"; >use Bio::Tools::Run::RemoteBlast; >use strict; >my $prog = 'blastp'; >my $db = 'swissprot'; >my $e_val= '1e-10'; > >my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >#change a paramter >$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens >[ORGN]'; > >#remove a parameter >delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > >my $v = 1; >#$v is just to turn on and off the messages > >my $r = $factory->submit_blast('blastInput.txt'); > >print STDERR "waiting..." if( $v > 0 ); >while ( my @rids = $factory->each_rid ) >{ > foreach my $rid ( @rids ) > { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else > { > print "RID $rid\n"; > $factory->save_output('temp.out'); > $factory->remove_rid($rid); > } > } >} > >################################################################################# > >This script prints the RID and terminates immediately. Obviously the >output file created is empty as the program did not wait for getting the >blast results from the RID. >Is there something I am doing wrong and what can I do for the program to wait >until the results are ready to be printed to the output file. I could not get >much information from the documentation and have no prior experience with >Bioperl. >Thanks very much for your attention. >Regards >Nageshbi >_______________________________________________ >Bioperl-l mailing list >Bioperl-l@portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > hi nagesh, try this, should work, I had the same problem: ....................... ....................... else { print "RID $rid\n"; $factory->save_output('temp.out'); my $checkinput = $factory->file; open(my $fh,"<$checkinput") or die $!; while(<$fh>){ print; } close $fh; $factory->remove_rid($rid); } } } regards Hubert PS: are you using the composition based statistics parameter with your blast search? if yes, is it working? From hubert.prielinger at gmx.at Mon Jan 16 16:57:07 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon Jan 16 17:50:34 2006 Subject: [Bioperl-l] parse Blast Output and Composition Based Statistics parameter In-Reply-To: References: Message-ID: <43CC16B3.70908@gmx.at> Hi Brian, yes, I have tried pasting the sequence manually at the NCBI Homepage and it is working fine. regards Hubert Brian Osborne wrote: >Hubert, > >If all the other parameters are passed correctly then I suspect this is not >a BioPerl problem. Did you try manually pasting these URLs into the browser >to confirm that NCBI is processing the parameters correctly? > >Brian O. > > >On 1/16/06 4:44 PM, "Hubert Prielinger" wrote: > > > >>Hubert Prielinger wrote: >> >> >> >>>Jason Stajich wrote: >>> >>> >>> >>>>(please don't try and post to bioperl-announce, it is not for >>>>questions.) >>>> >>>>On Jan 12, 2006, at 6:57 PM, Hubert Prielinger wrote: >>>> >>>> >>>> >>>>>Hello, >>>>>I want to know, if there is a possibility to get from a Blast >>>>>Outputfile the whole Sequence of a protein not only the best local >>>>>alignment... >>>>>for example: >>>>> >>>>> >>>>> >>>>No. The parser can only return to you what is in the report file... >>>>use Bio::DB::GenPept to retrieve the sequence via the web or >>>>(recommended) use a locally indexed sequence database like >>>>Bio::DB::Fasta >>>> >>>> >>>> >>>>>>ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica >>>>>> >>>>>> >>>>>cultivar-group)] >>>>>dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica >>>>>cultivar-group)] >>>>> Length=95 >>>>> >>>>>Score = 24.1 bits (47), Expect = 493 >>>>>Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) >>>>> >>>>>Query 2 KKRRRWW 8 >>>>> K+RRRWW >>>>>Sbjct 87 KRRRRWW 93 >>>>> >>>>>and now, if I parse the file, I want to get the whole Sequence of >>>>>this hypothetical protein....is that possible with hsp for example, >>>>>or any other way.... >>>>> >>>>>my second question is: >>>>>I do my blast search with bioperl and the remoteblast >>>>>module.....each parameter is working very well, except the >>>>>composition based statistics parameter.... >>>>>it looks like that: >>>>> >>>>>my $factory = $Bio::Tools::Run::RemoteBlast::HEADER >>>>>{'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>>>> >>>>> >>>>> >>>>uh no that is not how you would do it. >>>>You can make it the default for any factories you use in the script >>>>by doing this >>>> >>>> >>>> >>>>>$Bio::Tools::Run::RemoteBlast::HEADER >>>>>{'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>>>> >>>>> >>>>then >>>>$factory = Bio::Tools::Run::RemoteBlast->new(); >>>> >>>> >>>> =OR= >>>>Once you have a factory object you can set the parameter explicitly: >>>>$factory->submit_parameter('COMPOSITION_BASED_STATISTICS', 'yes'); >>>> >>>> >>>> >>>>>it should work like that, but it doesn't.... >>>>> >>>>>Thanks for your help in advance...... >>>>> >>>>>regards >>>>>Hubert >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l@portal.open-bio.org >>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>>>-- >>>>Jason Stajich >>>>Duke University >>>>http://www.duke.edu/~jes12 >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>Hi Jason, >>I have tried everything that you suggested, but the Composition Based >>Statistic parameter isn't still working, every >>other parameter works using e.g >> >>$Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000'; >> >>thanks in advance >>Hubert >> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > > From osborne1 at optonline.net Mon Jan 16 17:49:54 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon Jan 16 18:01:54 2006 Subject: [Bioperl-l] parse Blast Output and Composition Based Statistics parameter In-Reply-To: <43CC13A9.3010209@gmx.at> Message-ID: Hubert, If all the other parameters are passed correctly then I suspect this is not a BioPerl problem. Did you try manually pasting these URLs into the browser to confirm that NCBI is processing the parameters correctly? Brian O. On 1/16/06 4:44 PM, "Hubert Prielinger" wrote: > Hubert Prielinger wrote: > >> Jason Stajich wrote: >> >>> (please don't try and post to bioperl-announce, it is not for >>> questions.) >>> >>> On Jan 12, 2006, at 6:57 PM, Hubert Prielinger wrote: >>> >>>> Hello, >>>> I want to know, if there is a possibility to get from a Blast >>>> Outputfile the whole Sequence of a protein not only the best local >>>> alignment... >>>> for example: >>>> >>> No. The parser can only return to you what is in the report file... >>> use Bio::DB::GenPept to retrieve the sequence via the web or >>> (recommended) use a locally indexed sequence database like >>> Bio::DB::Fasta >>> >>>>> ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica >>>> cultivar-group)] >>>> dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica >>>> cultivar-group)] >>>> Length=95 >>>> >>>> Score = 24.1 bits (47), Expect = 493 >>>> Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) >>>> >>>> Query 2 KKRRRWW 8 >>>> K+RRRWW >>>> Sbjct 87 KRRRRWW 93 >>>> >>>> and now, if I parse the file, I want to get the whole Sequence of >>>> this hypothetical protein....is that possible with hsp for example, >>>> or any other way.... >>>> >>>> my second question is: >>>> I do my blast search with bioperl and the remoteblast >>>> module.....each parameter is working very well, except the >>>> composition based statistics parameter.... >>>> it looks like that: >>>> >>>> my $factory = $Bio::Tools::Run::RemoteBlast::HEADER >>>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>>> >>> uh no that is not how you would do it. >>> You can make it the default for any factories you use in the script >>> by doing this >>> >>>> $Bio::Tools::Run::RemoteBlast::HEADER >>>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>> >>> >>> then >>> $factory = Bio::Tools::Run::RemoteBlast->new(); >>> >>> >>> =OR= >>> Once you have a factory object you can set the parameter explicitly: >>> $factory->submit_parameter('COMPOSITION_BASED_STATISTICS', 'yes'); >>> >>>> it should work like that, but it doesn't.... >>>> >>>> Thanks for your help in advance...... >>>> >>>> regards >>>> Hubert >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> Jason Stajich >>> Duke University >>> http://www.duke.edu/~jes12 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > Hi Jason, > I have tried everything that you suggested, but the Composition Based > Statistic parameter isn't still working, every > other parameter works using e.g > > $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000'; > > thanks in advance > Hubert > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Mon Jan 16 20:11:40 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jan 16 20:07:58 2006 Subject: Fwd: [Bioperl-l] parse Blast Output and Composition Based Statistics parameter References: <43CC05E1.5070503@gmx.at> Message-ID: <9EC4AC10-CDDB-49AC-B9F9-5A7621F53F0F@duke.edu> sorry - i don't really have the time to support this module - lots of people on the list use it so they can hopefully help. Begin forwarded message: > From: Hubert Prielinger > Date: January 16, 2006 3:45:21 PM EST > To: Jason Stajich > Subject: Re: [Bioperl-l] parse Blast Output and Composition Based > Statistics parameter > > Jason Stajich wrote: > >> (please don't try and post to bioperl-announce, it is not for >> questions.) >> >> On Jan 12, 2006, at 6:57 PM, Hubert Prielinger wrote: >> >>> Hello, >>> I want to know, if there is a possibility to get from a Blast >>> Outputfile the whole Sequence of a protein not only the best >>> local alignment... >>> for example: >>> >> No. The parser can only return to you what is in the report file... >> use Bio::DB::GenPept to retrieve the sequence via the web or >> (recommended) use a locally indexed sequence database like >> Bio::DB::Fasta >> >>> >ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica >>> cultivar-group)] >>> dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica >>> cultivar-group)] >>> Length=95 >>> >>> Score = 24.1 bits (47), Expect = 493 >>> Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) >>> >>> Query 2 KKRRRWW 8 >>> K+RRRWW >>> Sbjct 87 KRRRRWW 93 >>> >>> and now, if I parse the file, I want to get the whole Sequence >>> of this hypothetical protein....is that possible with hsp for >>> example, or any other way.... >>> >>> my second question is: >>> I do my blast search with bioperl and the remoteblast >>> module.....each parameter is working very well, except the >>> composition based statistics parameter.... >>> it looks like that: >>> >>> my $factory = $Bio::Tools::Run::RemoteBlast::HEADER >>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>> >> uh no that is not how you would do it. >> You can make it the default for any factories you use in the >> script by doing this >> >>> $Bio::Tools::Run::RemoteBlast::HEADER >>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >> >> then >> $factory = Bio::Tools::Run::RemoteBlast->new(); >> >> >> =OR= >> Once you have a factory object you can set the parameter explicitly: >> $factory->submit_parameter('COMPOSITION_BASED_STATISTICS', 'yes'); >> >>> it should work like that, but it doesn't.... >>> >>> Thanks for your help in advance...... >>> >>> regards >>> Hubert >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > Hi Jason, > I have tried everything that you suggested, but the Composition > Based Statistic parameter isn't still working, every > other parameter works using e.g > > $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000'; > > thanks in advance > Hubert > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From christoph.gille at charite.de Wed Jan 11 06:22:25 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Tue Jan 17 01:36:21 2006 Subject: [Bioperl-l] internet proxy In-Reply-To: <43C46B52.7060600@infotech.monash.edu.au> References: <47893.192.168.220.203.1136927017.squirrel@webmail.charite.de> <43C46B52.7060600@infotech.monash.edu.au> Message-ID: <37130.192.168.220.203.1136978545.squirrel@webmail.charite.de> Hi Torsten, Sorry, does not work yet. I am not working with PERL long enough to sort this out. In Sopma.pm is my $request = POST 'http://npsa-pbil.ibcp.fr/cgi-bin/secpred_sopma.pl', Content_Type => 'form-data', Content => [title => "", notice => $self->seq->seq, ali_width => 70, states => $self->states, threshold => $self->similarity_threshold , width => $self->window_width, ]; Is POST a static method or is it an instance method ? If I call $sopma->env_proxy; does the POST method know this ? Does this method get a "self" reference of Sopma.pm ? I would have expected that I need to set a static field in the module that provides the POST method. Thanks for your help Christoph From bmoore at genetics.utah.edu Tue Jan 17 13:34:08 2006 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jan 17 13:41:09 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm Message-ID: Nagesh- Did you get this figured out? Your script works as is on my system. You say temp.out is empty? What does you input sequence (blastInput.txt) look like? Barry > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Monday, January 16, 2006 2:54 PM > To: Nagesh Chakka; bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > Nagesh Chakka wrote: > > >Hi All, > >I was trying to setup a system to perform a remote blast on regular > basis. I > >thought this could be best achieved by using BioPerl module and came > across > >RemoteBlast.pm > >I had modified the sample script "bp_remote_blast.pl" which takes a file > >containing single FASTA sequence as an input. Also I wanted the blast > report > >to be saved in a file for latter use and > >modified the code as follows > >I am using the latest version of Bioperl (1.5) on a Fedora platform. > >####################################################################### > >print "$Bio::Root::Version::VERSION\n"; > >use Bio::Tools::Run::RemoteBlast; > >use strict; > >my $prog = 'blastp'; > >my $db = 'swissprot'; > >my $e_val= '1e-10'; > > > >my @params = ( '-prog' => $prog, > > '-data' => $db, > > '-expect' => $e_val, > > '-readmethod' => 'SearchIO' ); > > > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > >#change a paramter > >$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > >[ORGN]'; > > > >#remove a parameter > >delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > >my $v = 1; > >#$v is just to turn on and off the messages > > > >my $r = $factory->submit_blast('blastInput.txt'); > > > >print STDERR "waiting..." if( $v > 0 ); > >while ( my @rids = $factory->each_rid ) > >{ > > foreach my $rid ( @rids ) > > { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) > > { > > if( $rc < 0 ) > > { > > $factory->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } > > else > > { > > print "RID $rid\n"; > > $factory->save_output('temp.out'); > > $factory->remove_rid($rid); > > } > > } > >} > > > >####################################################################### ## > ######## > > > >This script prints the RID and terminates immediately. Obviously the > >output file created is empty as the program did not wait for getting the > >blast results from the RID. > >Is there something I am doing wrong and what can I do for the program to > wait > >until the results are ready to be printed to the output file. I could not > get > >much information from the documentation and have no prior experience with > >Bioperl. > >Thanks very much for your attention. > >Regards > >Nageshbi > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@portal.open-bio.org > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > hi nagesh, > try this, should work, I had the same problem: > > ....................... > ....................... > > else > { > print "RID $rid\n"; > $factory->save_output('temp.out'); > > my $checkinput = $factory->file; > open(my $fh,"<$checkinput") or die $!; > while(<$fh>){ > print; > } > close $fh; > > > $factory->remove_rid($rid); > } > } > } > > regards > Hubert > > PS: are you using the composition based statistics parameter with your > blast search? > if yes, is it working? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Tue Jan 17 14:37:20 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Tue Jan 17 15:31:30 2006 Subject: Fwd: [Bioperl-l] parse Blast Output and Composition Based Statistics parameter In-Reply-To: <9EC4AC10-CDDB-49AC-B9F9-5A7621F53F0F@duke.edu> References: <43CC05E1.5070503@gmx.at> <9EC4AC10-CDDB-49AC-B9F9-5A7621F53F0F@duke.edu> Message-ID: <43CD4770.1020206@gmx.at> Hi Jason, I have written to NCBI helpdesk if they can help me with further information... that's the response: Hello, I'm sorry, this has recently changed. Instead of "Yes", try using either '0' '1' or '2', where: '0' = No Composition-based statistics '1' = Conditional compositional score matrix adjustment (apply only to 'biased' sequences) '2' = Universal compositional score matrix adjustment (apply to all). This works with the URLAPI; I've not tested with the perl module. Best regards, Wayne regards Hubert Jason Stajich wrote: > sorry - i don't really have the time to support this module - lots of > people on the list use it so they can hopefully help. > > Begin forwarded message: > >> *From: *Hubert Prielinger > > >> *Date: *January 16, 2006 3:45:21 PM EST >> *To: *Jason Stajich > > >> *Subject: **Re: [Bioperl-l] parse Blast Output and Composition Based >> Statistics parameter* >> >> Jason Stajich wrote: >> >>> (please don't try and post to bioperl-announce, it is not for >>> questions.) >>> >>> On Jan 12, 2006, at 6:57 PM, Hubert Prielinger wrote: >>> >>>> Hello, >>>> I want to know, if there is a possibility to get from a Blast >>>> Outputfile the whole Sequence of a protein not only the best local >>>> alignment... >>>> for example: >>>> >>> No. The parser can only return to you what is in the report file... >>> use Bio::DB::GenPept to retrieve the sequence via the web or >>> (recommended) use a locally indexed sequence database like >>> Bio::DB::Fasta >>> >>>> >ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica >>>> cultivar-group)] >>>> dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica >>>> cultivar-group)] >>>> Length=95 >>>> >>>> Score = 24.1 bits (47), Expect = 493 >>>> Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) >>>> >>>> Query 2 KKRRRWW 8 >>>> K+RRRWW >>>> Sbjct 87 KRRRRWW 93 >>>> >>>> and now, if I parse the file, I want to get the whole Sequence of >>>> this hypothetical protein....is that possible with hsp for >>>> example, or any other way.... >>>> >>>> my second question is: >>>> I do my blast search with bioperl and the remoteblast >>>> module.....each parameter is working very well, except the >>>> composition based statistics parameter.... >>>> it looks like that: >>>> >>>> my $factory = $Bio::Tools::Run::RemoteBlast::HEADER >>>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>>> >>> uh no that is not how you would do it. >>> You can make it the default for any factories you use in the script >>> by doing this >>> >>>> $Bio::Tools::Run::RemoteBlast::HEADER >>>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>> >>> >>> then >>> $factory = Bio::Tools::Run::RemoteBlast->new(); >>> >>> >>> =OR= >>> Once you have a factory object you can set the parameter explicitly: >>> $factory->submit_parameter('COMPOSITION_BASED_STATISTICS', 'yes'); >>> >>>> it should work like that, but it doesn't.... >>>> >>>> Thanks for your help in advance...... >>>> >>>> regards >>>> Hubert >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> Jason Stajich >>> Duke University >>> http://www.duke.edu/~jes12 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> Hi Jason, >> I have tried everything that you suggested, but the Composition Based >> Statistic parameter isn't still working, every >> other parameter works using e.g >> >> $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000'; >> >> thanks in advance >> Hubert >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > From nagesh.chakka at anu.edu.au Tue Jan 17 15:57:14 2006 From: nagesh.chakka at anu.edu.au (Nagesh Chakka) Date: Tue Jan 17 16:14:48 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm In-Reply-To: References: Message-ID: <200601180757.14592.nagesh.chakka@anu.edu.au> Bi Barry, With the help of Hubert, I further modified the script but still have the same problem. The problem is that from the point of submitting the blast query, the script does not wait until the blast results are ready for retrieval and event of submission is immediately followed by retrieving and saving the output. Since the results will not be ready (about a sec) this fast, the output created is blank. I am able to retrieve the results online using the RID which I am making the script to print. So my main problem is making the program to wait after submitting the result. My input file has a single fasta sequence which I have pasted below. Its interesting to note that the script works on your system. Is it creating an output file with the blast report? Thanks very much for your attention. Regards Nagesh blastInput.txt >MusDpl MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAFIKQGRKLDIDFGAEGNRYYA ANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAEFSREKQDSKLHQRVLWRLIKEICSAKHCDFWLERGAAL RVAVDQPAMVCLLGFVWFIVK On Wednesday 18 January 2006 05:34, Barry Moore wrote: > Nagesh- > > Did you get this figured out? Your script works as is on my system. > You say temp.out is empty? What does you input sequence > (blastInput.txt) look like? > > Barry > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger > > Sent: Monday, January 16, 2006 2:54 PM > > To: Nagesh Chakka; bioperl-l@portal.open-bio.org > > Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > > > Nagesh Chakka wrote: > > >Hi All, > > >I was trying to setup a system to perform a remote blast on regular > > > > basis. I > > > > >thought this could be best achieved by using BioPerl module and came > > > > across > > > > >RemoteBlast.pm > > >I had modified the sample script "bp_remote_blast.pl" which takes a > > file > > > >containing single FASTA sequence as an input. Also I wanted the blast > > > > report > > > > >to be saved in a file for latter use and > > >modified the code as follows > > >I am using the latest version of Bioperl (1.5) on a Fedora platform. > > > >####################################################################### > > > > >print "$Bio::Root::Version::VERSION\n"; > > >use Bio::Tools::Run::RemoteBlast; > > >use strict; > > >my $prog = 'blastp'; > > >my $db = 'swissprot'; > > >my $e_val= '1e-10'; > > > > > >my @params = ( '-prog' => $prog, > > > '-data' => $db, > > > '-expect' => $e_val, > > > '-readmethod' => 'SearchIO' ); > > > > > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > >#change a paramter > > >$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > >[ORGN]'; > > > > > >#remove a parameter > > >delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > > > >my $v = 1; > > >#$v is just to turn on and off the messages > > > > > >my $r = $factory->submit_blast('blastInput.txt'); > > > > > >print STDERR "waiting..." if( $v > 0 ); > > >while ( my @rids = $factory->each_rid ) > > >{ > > > foreach my $rid ( @rids ) > > > { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) > > > { > > > if( $rc < 0 ) > > > { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } > > > else > > > { > > > print "RID $rid\n"; > > > $factory->save_output('temp.out'); > > > $factory->remove_rid($rid); > > > } > > > } > > >} > > > >####################################################################### > > ## > > > ######## > > > > >This script prints the RID and terminates immediately. Obviously the > > >output file created is empty as the program did not wait for getting > > the > > > >blast results from the RID. > > >Is there something I am doing wrong and what can I do for the program > > to > > > wait > > > > >until the results are ready to be printed to the output file. I could > > not > > > get > > > > >much information from the documentation and have no prior experience > > with > > > >Bioperl. > > >Thanks very much for your attention. > > >Regards > > >Nageshbi > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > hi nagesh, > > try this, should work, I had the same problem: > > > > ....................... > > ....................... > > > > else > > { > > print "RID $rid\n"; > > $factory->save_output('temp.out'); > > > > my $checkinput = $factory->file; > > open(my $fh,"<$checkinput") or die $!; > > while(<$fh>){ > > print; > > } > > close $fh; > > > > > > $factory->remove_rid($rid); > > } > > } > > } > > > > regards > > Hubert > > > > PS: are you using the composition based statistics parameter with your > > blast search? > > if yes, is it working? > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Tue Jan 17 16:27:07 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Tue Jan 17 17:20:34 2006 Subject: Fwd: [Bioperl-l] parse Blast Output and Composition Based Statistics parameter In-Reply-To: <9EC4AC10-CDDB-49AC-B9F9-5A7621F53F0F@duke.edu> References: <43CC05E1.5070503@gmx.at> <9EC4AC10-CDDB-49AC-B9F9-5A7621F53F0F@duke.edu> Message-ID: <43CD612B.5010608@gmx.at> Hi Jason, It works the following way, I have just tried it: $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; regards Hubert Jason Stajich wrote: > sorry - i don't really have the time to support this module - lots of > people on the list use it so they can hopefully help. > > Begin forwarded message: > >> *From: *Hubert Prielinger > > >> *Date: *January 16, 2006 3:45:21 PM EST >> *To: *Jason Stajich > > >> *Subject: **Re: [Bioperl-l] parse Blast Output and Composition Based >> Statistics parameter* >> >> Jason Stajich wrote: >> >>> (please don't try and post to bioperl-announce, it is not for >>> questions.) >>> >>> On Jan 12, 2006, at 6:57 PM, Hubert Prielinger wrote: >>> >>>> Hello, >>>> I want to know, if there is a possibility to get from a Blast >>>> Outputfile the whole Sequence of a protein not only the best local >>>> alignment... >>>> for example: >>>> >>> No. The parser can only return to you what is in the report file... >>> use Bio::DB::GenPept to retrieve the sequence via the web or >>> (recommended) use a locally indexed sequence database like >>> Bio::DB::Fasta >>> >>>> >ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica >>>> cultivar-group)] >>>> dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica >>>> cultivar-group)] >>>> Length=95 >>>> >>>> Score = 24.1 bits (47), Expect = 493 >>>> Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) >>>> >>>> Query 2 KKRRRWW 8 >>>> K+RRRWW >>>> Sbjct 87 KRRRRWW 93 >>>> >>>> and now, if I parse the file, I want to get the whole Sequence of >>>> this hypothetical protein....is that possible with hsp for >>>> example, or any other way.... >>>> >>>> my second question is: >>>> I do my blast search with bioperl and the remoteblast >>>> module.....each parameter is working very well, except the >>>> composition based statistics parameter.... >>>> it looks like that: >>>> >>>> my $factory = $Bio::Tools::Run::RemoteBlast::HEADER >>>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>>> >>> uh no that is not how you would do it. >>> You can make it the default for any factories you use in the script >>> by doing this >>> >>>> $Bio::Tools::Run::RemoteBlast::HEADER >>>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>> >>> >>> then >>> $factory = Bio::Tools::Run::RemoteBlast->new(); >>> >>> >>> =OR= >>> Once you have a factory object you can set the parameter explicitly: >>> $factory->submit_parameter('COMPOSITION_BASED_STATISTICS', 'yes'); >>> >>>> it should work like that, but it doesn't.... >>>> >>>> Thanks for your help in advance...... >>>> >>>> regards >>>> Hubert >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> Jason Stajich >>> Duke University >>> http://www.duke.edu/~jes12 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> Hi Jason, >> I have tried everything that you suggested, but the Composition Based >> Statistic parameter isn't still working, every >> other parameter works using e.g >> >> $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000'; >> >> thanks in advance >> Hubert >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > From bmoore at genetics.utah.edu Tue Jan 17 17:33:23 2006 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jan 17 17:28:03 2006 Subject: [Bioperl-l] parse Blast Output and Composition Based Statisticsparameter Message-ID: Hubert, What exactly isn't working for you with the composition based statistics. Are you getting different e-values from Bioperl vs. NCBI website. It seems to be working OK for me (at least on one quick test). Barry > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Jason Stajich > Sent: Monday, January 16, 2006 6:12 PM > To: bioperl-ml List > Cc: Hubert Prielinger > Subject: Fwd: [Bioperl-l] parse Blast Output and Composition Based > Statisticsparameter > > sorry - i don't really have the time to support this module - lots of > people on the list use it so they can hopefully help. > > Begin forwarded message: > > > From: Hubert Prielinger > > Date: January 16, 2006 3:45:21 PM EST > > To: Jason Stajich > > Subject: Re: [Bioperl-l] parse Blast Output and Composition Based > > Statistics parameter > > > > Jason Stajich wrote: > > > >> (please don't try and post to bioperl-announce, it is not for > >> questions.) > >> > >> On Jan 12, 2006, at 6:57 PM, Hubert Prielinger wrote: > >> > >>> Hello, > >>> I want to know, if there is a possibility to get from a Blast > >>> Outputfile the whole Sequence of a protein not only the best > >>> local alignment... > >>> for example: > >>> > >> No. The parser can only return to you what is in the report file... > >> use Bio::DB::GenPept to retrieve the sequence via the web or > >> (recommended) use a locally indexed sequence database like > >> Bio::DB::Fasta > >> > >>> >ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica > >>> cultivar-group)] > >>> dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica > >>> cultivar-group)] > >>> Length=95 > >>> > >>> Score = 24.1 bits (47), Expect = 493 > >>> Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) > >>> > >>> Query 2 KKRRRWW 8 > >>> K+RRRWW > >>> Sbjct 87 KRRRRWW 93 > >>> > >>> and now, if I parse the file, I want to get the whole Sequence > >>> of this hypothetical protein....is that possible with hsp for > >>> example, or any other way.... > >>> > >>> my second question is: > >>> I do my blast search with bioperl and the remoteblast > >>> module.....each parameter is working very well, except the > >>> composition based statistics parameter.... > >>> it looks like that: > >>> > >>> my $factory = $Bio::Tools::Run::RemoteBlast::HEADER > >>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; > >>> > >> uh no that is not how you would do it. > >> You can make it the default for any factories you use in the > >> script by doing this > >> > >>> $Bio::Tools::Run::RemoteBlast::HEADER > >>> {'COMPOSITION_BASED_STATISTICS'} = 'yes'; > >> > >> then > >> $factory = Bio::Tools::Run::RemoteBlast->new(); > >> > >> > >> =OR= > >> Once you have a factory object you can set the parameter explicitly: > >> $factory->submit_parameter('COMPOSITION_BASED_STATISTICS', 'yes'); > >> > >>> it should work like that, but it doesn't.... > >>> > >>> Thanks for your help in advance...... > >>> > >>> regards > >>> Hubert > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l@portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> -- > >> Jason Stajich > >> Duke University > >> http://www.duke.edu/~jes12 > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > Hi Jason, > > I have tried everything that you suggested, but the Composition > > Based Statistic parameter isn't still working, every > > other parameter works using e.g > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000'; > > > > thanks in advance > > Hubert > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Tue Jan 17 17:09:46 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Tue Jan 17 18:03:27 2006 Subject: [Bioperl-l] parse Blast Output and Composition Based Statisticsparameter In-Reply-To: References: Message-ID: <43CD6B2A.2080308@gmx.at> Hello Barry, Thanks, but I have already solved it, as I have respondet to Jason, the parameter doesn't work with yes or no, anymore, because after contacting the NCBI helpdesk, they figured out that you have to use '0' or '1', because they have recently changed it like: $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; with me, it didn't work with 'yes' or 'no' regards Hubert PS: orginal response mail by NCBI helpdesk: Hello, I'm sorry, this has recently changed. Instead of "Yes", try using either '0' '1' or '2', where: '0' = No Composition-based statistics '1' = Conditional compositional score matrix adjustment (apply only to 'biased' sequences) '2' = Universal compositional score matrix adjustment (apply to all). This works with the URLAPI; I've not tested with the perl module. Best regards, Wayne Barry Moore wrote: >Hubert, > >What exactly isn't working for you with the composition based >statistics. Are you getting different e-values from Bioperl vs. NCBI >website. It seems to be working OK for me (at least on one quick test). > >Barry > > > >>-----Original Message----- >>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >>bounces@portal.open-bio.org] On Behalf Of Jason Stajich >>Sent: Monday, January 16, 2006 6:12 PM >>To: bioperl-ml List >>Cc: Hubert Prielinger >>Subject: Fwd: [Bioperl-l] parse Blast Output and Composition Based >>Statisticsparameter >> >>sorry - i don't really have the time to support this module - lots of >>people on the list use it so they can hopefully help. >> >>Begin forwarded message: >> >> >> >>>From: Hubert Prielinger >>>Date: January 16, 2006 3:45:21 PM EST >>>To: Jason Stajich >>>Subject: Re: [Bioperl-l] parse Blast Output and Composition Based >>>Statistics parameter >>> >>>Jason Stajich wrote: >>> >>> >>> >>>>(please don't try and post to bioperl-announce, it is not for >>>>questions.) >>>> >>>>On Jan 12, 2006, at 6:57 PM, Hubert Prielinger wrote: >>>> >>>> >>>> >>>>>Hello, >>>>>I want to know, if there is a possibility to get from a Blast >>>>>Outputfile the whole Sequence of a protein not only the best >>>>>local alignment... >>>>>for example: >>>>> >>>>> >>>>> >>>>No. The parser can only return to you what is in the report file... >>>>use Bio::DB::GenPept to retrieve the sequence via the web or >>>>(recommended) use a locally indexed sequence database like >>>>Bio::DB::Fasta >>>> >>>> >>>> >>>>>>ref|XP_480077.1| hypothetical protein [Oryza sativa (japonica >>>>>> >>>>>> >>>>>cultivar-group)] >>>>>dbj|BAD33542.1| hypothetical protein [Oryza sativa (japonica >>>>>cultivar-group)] >>>>> Length=95 >>>>> >>>>>Score = 24.1 bits (47), Expect = 493 >>>>>Identities = 6/7 (85%), Positives = 7/7 (100%), Gaps = 0/7 (0%) >>>>> >>>>>Query 2 KKRRRWW 8 >>>>> K+RRRWW >>>>>Sbjct 87 KRRRRWW 93 >>>>> >>>>>and now, if I parse the file, I want to get the whole Sequence >>>>>of this hypothetical protein....is that possible with hsp for >>>>>example, or any other way.... >>>>> >>>>>my second question is: >>>>>I do my blast search with bioperl and the remoteblast >>>>>module.....each parameter is working very well, except the >>>>>composition based statistics parameter.... >>>>>it looks like that: >>>>> >>>>>my $factory = $Bio::Tools::Run::RemoteBlast::HEADER >>>>>{'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>>>> >>>>> >>>>> >>>>uh no that is not how you would do it. >>>>You can make it the default for any factories you use in the >>>>script by doing this >>>> >>>> >>>> >>>>>$Bio::Tools::Run::RemoteBlast::HEADER >>>>>{'COMPOSITION_BASED_STATISTICS'} = 'yes'; >>>>> >>>>> >>>>then >>>>$factory = Bio::Tools::Run::RemoteBlast->new(); >>>> >>>> >>>> =OR= >>>>Once you have a factory object you can set the parameter >>>> >>>> >explicitly: > > >>>>$factory->submit_parameter('COMPOSITION_BASED_STATISTICS', 'yes'); >>>> >>>> >>>> >>>>>it should work like that, but it doesn't.... >>>>> >>>>>Thanks for your help in advance...... >>>>> >>>>>regards >>>>>Hubert >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l@portal.open-bio.org >>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>-- >>>>Jason Stajich >>>>Duke University >>>>http://www.duke.edu/~jes12 >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>>Hi Jason, >>>I have tried everything that you suggested, but the Composition >>>Based Statistic parameter isn't still working, every >>>other parameter works using e.g >>> >>>$Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000'; >>> >>>thanks in advance >>>Hubert >>> >>> >>> >>-- >>Jason Stajich >>Duke University >>http://www.duke.edu/~jes12 >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@portal.open-bio.org >>http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > From bmoore at genetics.utah.edu Tue Jan 17 18:03:55 2006 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jan 17 20:22:44 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm Message-ID: Nagesh, Attached is an input file, script and output. These work for me, and I think they are the same that you are using. Have a look and see if you can find any differences that might be causing you problem. Other than that I don't know what to tell you. If you are familiar with the perl debugger you (and if you're not, now's probably a good time to become familiar with it) you should step through you script and be sure that all of you're objects are getting defined when they are supposed to be. That can often help narrow down the problem. Barry > -----Original Message----- > From: Nagesh Chakka [mailto:nagesh.chakka@anu.edu.au] > Sent: Tuesday, January 17, 2006 1:57 PM > To: Barry Moore > Cc: Hubert Prielinger; bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > Bi Barry, > With the help of Hubert, I further modified the script but still have the > same > problem. The problem is that from the point of submitting the blast query, > the script does not wait until the blast results are ready for retrieval > and > event of submission is immediately followed by retrieving and saving the > output. Since the results will not be ready (about a sec) this fast, the > output created is blank. I am able to retrieve the results online using > the > RID which I am making the script to print. > So my main problem is making the program to wait after submitting the > result. > My input file has a single fasta sequence which I have pasted below. > Its interesting to note that the script works on your system. Is it > creating > an output file with the blast report? > Thanks very much for your attention. > Regards > Nagesh > > blastInput.txt > >MusDpl > MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAFIKQGRKLDIDFG AE > GNRYYA > ANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAEFSREKQDSKLHQRVLWRLIKEICSAKHCDFWL ER > GAAL > RVAVDQPAMVCLLGFVWFIVK > > On Wednesday 18 January 2006 05:34, Barry Moore wrote: > > Nagesh- > > > > Did you get this figured out? Your script works as is on my system. > > You say temp.out is empty? What does you input sequence > > (blastInput.txt) look like? > > > > Barry > > > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > > bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger > > > Sent: Monday, January 16, 2006 2:54 PM > > > To: Nagesh Chakka; bioperl-l@portal.open-bio.org > > > Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > > > > > Nagesh Chakka wrote: > > > >Hi All, > > > >I was trying to setup a system to perform a remote blast on regular > > > > > > basis. I > > > > > > >thought this could be best achieved by using BioPerl module and came > > > > > > across > > > > > > >RemoteBlast.pm > > > >I had modified the sample script "bp_remote_blast.pl" which takes a > > > > file > > > > > >containing single FASTA sequence as an input. Also I wanted the blast > > > > > > report > > > > > > >to be saved in a file for latter use and > > > >modified the code as follows > > > >I am using the latest version of Bioperl (1.5) on a Fedora platform. > > > > > >####################################################################### > > > > > > >print "$Bio::Root::Version::VERSION\n"; > > > >use Bio::Tools::Run::RemoteBlast; > > > >use strict; > > > >my $prog = 'blastp'; > > > >my $db = 'swissprot'; > > > >my $e_val= '1e-10'; > > > > > > > >my @params = ( '-prog' => $prog, > > > > '-data' => $db, > > > > '-expect' => $e_val, > > > > '-readmethod' => 'SearchIO' ); > > > > > > > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > > >#change a paramter > > > >$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > > >[ORGN]'; > > > > > > > >#remove a parameter > > > >delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > > > > > >my $v = 1; > > > >#$v is just to turn on and off the messages > > > > > > > >my $r = $factory->submit_blast('blastInput.txt'); > > > > > > > >print STDERR "waiting..." if( $v > 0 ); > > > >while ( my @rids = $factory->each_rid ) > > > >{ > > > > foreach my $rid ( @rids ) > > > > { > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) > > > > { > > > > if( $rc < 0 ) > > > > { > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ( $v > 0 ); > > > > sleep 5; > > > > } > > > > else > > > > { > > > > print "RID $rid\n"; > > > > $factory->save_output('temp.out'); > > > > $factory->remove_rid($rid); > > > > } > > > > } > > > >} > > > > > >####################################################################### > > > > ## > > > > > ######## > > > > > > >This script prints the RID and terminates immediately. Obviously the > > > >output file created is empty as the program did not wait for getting > > > > the > > > > > >blast results from the RID. > > > >Is there something I am doing wrong and what can I do for the program > > > > to > > > > > wait > > > > > > >until the results are ready to be printed to the output file. I could > > > > not > > > > > get > > > > > > >much information from the documentation and have no prior experience > > > > with > > > > > >Bioperl. > > > >Thanks very much for your attention. > > > >Regards > > > >Nageshbi > > > >_______________________________________________ > > > >Bioperl-l mailing list > > > >Bioperl-l@portal.open-bio.org > > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > hi nagesh, > > > try this, should work, I had the same problem: > > > > > > ....................... > > > ....................... > > > > > > else > > > { > > > print "RID $rid\n"; > > > $factory->save_output('temp.out'); > > > > > > my $checkinput = $factory->file; > > > open(my $fh,"<$checkinput") or die $!; > > > while(<$fh>){ > > > print; > > > } > > > close $fh; > > > > > > > > > $factory->remove_rid($rid); > > > } > > > } > > > } > > > > > > regards > > > Hubert > > > > > > PS: are you using the composition based statistics parameter with your > > > blast search? > > > if yes, is it working? > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: bp_test.pl Type: application/octet-stream Size: 1281 bytes Desc: bp_test.pl Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060117/04a14cd4/bp_test-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: temp.out Type: application/octet-stream Size: 2615 bytes Desc: temp.out Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060117/04a14cd4/temp-0001.obj -------------- next part -------------- >MusDpl MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAFIKQGRKLDIDFGAEGNRYYA ANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAEFSREKQDSKLHQRVLWRLIKEICSAKHCDFWLERGAAL RVAVDQPAMVCLLGFVWFIVK From jan.aerts at bbsrc.ac.uk Tue Jan 17 04:54:29 2006 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Tue Jan 17 20:22:53 2006 Subject: [Bioperl-l] concatenate two embl sequence files Message-ID: <84DA9D8AC9B05F4B889E7C70238CB451030DAB19@rie2ksrv1.ri.bbsrc.ac.uk> Hi all, Does anyone know of an easy way to concatenate two sequences, including recalculation of features positions of the second one? E.g. seq 1 = 100 bp feature A: 5..15 seq 2 = 200 bp feature B: 20..30 => concatenated sequence 3 = 300 bp feature A: 5..15 feature B: 120..130 <<<<<<<<<<< Annotations (features without range) should be transferred as well. Of course, it must be possible to create a blank sequence and work my way through all features, adding them to a new collection of features and stuff. But I was wondering if a simpler technique is possible. Many thanks, Jan Aerts Bioinformatics Department Roslin Institute Roslin, Scotland, UK ---------The obligatory disclaimer-------- The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. From jaymoore at plantkind.com Tue Jan 17 10:09:44 2006 From: jaymoore at plantkind.com (Jay Moore) Date: Tue Jan 17 20:23:25 2006 Subject: [Bioperl-l] Context-sensitive alignment parameters Message-ID: <200601171512.k0HFCj8V012227@portal.open-bio.org> Not strictly bioperl, but if anyone has any ideas, I would appreciate the feedback. I am doing some comparative work between partially-sequenced plant genomic DNA, and fully-sequenced Arabidopsis genome. When I am aligning sequences from other plants to Arabidopsis, the introns are much less well-conserved than the exons, and this ought to be the case for animals and other organisms too. Does anyone use make any allowance for this, by setting gap and gap-extension, or substitution matrix parameters in a context-sensitive way? Is there an alignment method that can take this kind of thing into account? Is it worth trying to take it into account anyway? Just wondered if anyone has a take, or any information, on this. Jay Moore Warwick HRI http://www.warwickhri.ac.uk From cjfields at uiuc.edu Tue Jan 17 20:44:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue Jan 17 20:41:14 2006 Subject: [Bioperl-l] bioperl-db working (for the moment) on Win32 Message-ID: <676216A3-1A01-46C1-9873-C52DE6F01994@uiuc.edu> Hilmar, Just wanted to drop a line saying bioperl-db seems to be up and running on Windows (at least for the moment!). All tests pass using ActivePerl and cygwin-perl. I am trying to sort out the issue with throw in Bio::Root::Root (specifically, why it doesn't work without the added comma; I'm trying the modifications to Root.pm on Mac OS X now) and am trying to also figure out why bioperl and bioperl-db give tons of warnings using ActivePerl (most just state that x subroutine was redefined in y.pm line z, so aren't serious). This is an ActivePerl or nmake issue and not a bioperl problem as there are no warnings using 'make test' in cygwin. I am in the midst of writing up the steps for installing bioperl and bioperl-db using MySQL as the relational DB with either ActivePerl or cygwin; I really don't have much experience with postgreSQL, oracle, MsSQL (B. Wang's added modules), etc., but I can't see any reason why they wouldn't work. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bmoore at genetics.utah.edu Tue Jan 17 21:02:33 2006 From: bmoore at genetics.utah.edu (Barry Moore) Date: Tue Jan 17 20:57:12 2006 Subject: [Bioperl-l] bioperl-db working (for the moment) on Win32 Message-ID: This is very helpful Chris. Thank you. Barry > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Chris Fields > Sent: Tuesday, January 17, 2006 6:45 PM > To: bioperl-l@portal.open-bio.org > Subject: [Bioperl-l] bioperl-db working (for the moment) on Win32 > > Hilmar, > > Just wanted to drop a line saying bioperl-db seems to be up and > running on Windows (at least for the moment!). All tests pass using > ActivePerl and cygwin-perl. I am trying to sort out the issue with > throw in Bio::Root::Root (specifically, why it doesn't work without > the added comma; I'm trying the modifications to Root.pm on Mac OS X > now) and am trying to also figure out why bioperl and bioperl-db give > tons of warnings using ActivePerl (most just state that x subroutine > was redefined in y.pm line z, so aren't serious). This is an > ActivePerl or nmake issue and not a bioperl problem as there are no > warnings using 'make test' in cygwin. I am in the midst of writing > up the steps for installing bioperl and bioperl-db using MySQL as the > relational DB with either ActivePerl or cygwin; I really don't have > much experience with postgreSQL, oracle, MsSQL (B. Wang's added > modules), etc., but I can't see any reason why they wouldn't work. > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Jan 18 02:07:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jan 18 02:03:42 2006 Subject: [Bioperl-l] bioperl-db working (for the moment) on Win32 In-Reply-To: References: Message-ID: Same here, thanks. We'll include your write-up in CVS. -hilmar On Jan 17, 2006, at 6:02 PM, Barry Moore wrote: > This is very helpful Chris. Thank you. > > Barry > >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >> bounces@portal.open-bio.org] On Behalf Of Chris Fields >> Sent: Tuesday, January 17, 2006 6:45 PM >> To: bioperl-l@portal.open-bio.org >> Subject: [Bioperl-l] bioperl-db working (for the moment) on Win32 >> >> Hilmar, >> >> Just wanted to drop a line saying bioperl-db seems to be up and >> running on Windows (at least for the moment!). All tests pass using >> ActivePerl and cygwin-perl. I am trying to sort out the issue with >> throw in Bio::Root::Root (specifically, why it doesn't work without >> the added comma; I'm trying the modifications to Root.pm on Mac OS X >> now) and am trying to also figure out why bioperl and bioperl-db give >> tons of warnings using ActivePerl (most just state that x subroutine >> was redefined in y.pm line z, so aren't serious). This is an >> ActivePerl or nmake issue and not a bioperl problem as there are no >> warnings using 'make test' in cygwin. I am in the midst of writing >> up the steps for installing bioperl and bioperl-db using MySQL as the >> relational DB with either ActivePerl or cygwin; I really don't have >> much experience with postgreSQL, oracle, MsSQL (B. Wang's added >> modules), etc., but I can't see any reason why they wouldn't work. >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From heikki at sanbi.ac.za Wed Jan 18 02:11:20 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed Jan 18 02:35:39 2006 Subject: [Bioperl-l] concatenate two embl sequence files In-Reply-To: <84DA9D8AC9B05F4B889E7C70238CB451030DAB19@rie2ksrv1.ri.bbsrc.ac.uk> References: <84DA9D8AC9B05F4B889E7C70238CB451030DAB19@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: <200601180911.20454.heikki@sanbi.ac.za> Jan, It would be easy if someone had written a function to do it. Even writing the function is not hard. I do not think there is no other way than go through all features, though. In my opinion this would be an excellent addition to Bio::Seq::Utilities. E.g. cat($arrayrefofsequences, optional_seq_class_to_create) return a new seq, species and other info based on the first seq in array Could you write it and post to bugzilla? -Heikki On Tuesday 17 January 2006 11:54, jan aerts (RI) wrote: > Hi all, > > Does anyone know of an easy way to concatenate two sequences, including > recalculation of features positions of the second one? E.g. > seq 1 = 100 bp > feature A: 5..15 > seq 2 = 200 bp > feature B: 20..30 > => concatenated sequence 3 = 300 bp > feature A: 5..15 > feature B: 120..130 <<<<<<<<<<< > > Annotations (features without range) should be transferred as well. > > Of course, it must be possible to create a blank sequence and work my > way through all features, adding them to a new collection of features > and stuff. But I was wondering if a simpler technique is possible. > > Many thanks, > Jan Aerts > Bioinformatics Department > Roslin Institute > Roslin, Scotland, UK > > ---------The obligatory disclaimer-------- > The information contained in this e-mail (including any attachments) is > confidential and is intended for the use of the addressee only. The > opinions expressed within this e-mail (including any attachments) are > the opinions of the sender and do not necessarily constitute those of > Roslin Institute (Edinburgh) ("the Institute") unless specifically > stated by a sender who is duly authorised to do so on behalf of the > Institute. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Wed Jan 18 11:51:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed Jan 18 11:48:13 2006 Subject: [Bioperl-l] GMOD PPM repository not working Message-ID: <000001c61c4f$7d835170$15327e82@pyrimidine> Scott, I am trying to find the newest bioperl dev. Release (1.51) from PPM for a quick write-up on installing bioperl-db on Windows. I tried using the GMOD repository: ppm> rep add gmod http://www.gmod.org/ggb/ppm Repositories: [1] gmod [ ] ActiveState Package Repository [ ] ActiveState PPM2 Repository [ ] Bioperl [ ] Bribes [ ] Kobes [ ] local ppm> search bioperl Searching in Active Repositories No matches for 'bioperl'; see 'help search'. ppm> search * Searching in Active Repositories No matches for '*'; see 'help search'. ppm> Any idea what's going on? All other repositories work fine. I can download it and install locally w/o a problem. I am running the newest ActivePerl (5.8.7.815), WinXP. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Jan 18 12:16:33 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed Jan 18 12:12:52 2006 Subject: [Bioperl-l] bioperl-db working (for the moment) on Win32 In-Reply-To: Message-ID: <000101c61c52$f20e3070$15327e82@pyrimidine> Should I get a PPM for the CVS version of bioperl-db ready, or should we just go with 'nmake', 'nmake test', 'nmake install'? If I can get a PPM build to the same repository as the bioperl PPM (http://bioperl.org/DIST/), it will probably cut down on questions from new users. I'm using a PPM build for both bioperl-live and bioperl-db at the moment, which can be easily modified for the repository. Also, what version of bioperl should be used with bioperl-db (I'm adding it as a dependency)? Will bioperl-1.4 do, or do we need 1.5.1 (available at the GMOD repository, http://www.gmod.org/ggb/ppm/)? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Hilmar Lapp > Sent: Wednesday, January 18, 2006 1:07 AM > To: Barry Moore > Cc: Chris Fields; bioperl-l@portal.open-bio.org > Subject: Re: [Bioperl-l] bioperl-db working (for the moment) on Win32 > > Same here, thanks. We'll include your write-up in CVS. -hilmar > > On Jan 17, 2006, at 6:02 PM, Barry Moore wrote: > > > This is very helpful Chris. Thank you. > > > > Barry > > > >> -----Original Message----- > >> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > >> bounces@portal.open-bio.org] On Behalf Of Chris Fields > >> Sent: Tuesday, January 17, 2006 6:45 PM > >> To: bioperl-l@portal.open-bio.org > >> Subject: [Bioperl-l] bioperl-db working (for the moment) on Win32 > >> > >> Hilmar, > >> > >> Just wanted to drop a line saying bioperl-db seems to be up and > >> running on Windows (at least for the moment!). All tests pass using > >> ActivePerl and cygwin-perl. I am trying to sort out the issue with > >> throw in Bio::Root::Root (specifically, why it doesn't work without > >> the added comma; I'm trying the modifications to Root.pm on Mac OS X > >> now) and am trying to also figure out why bioperl and bioperl-db give > >> tons of warnings using ActivePerl (most just state that x subroutine > >> was redefined in y.pm line z, so aren't serious). This is an > >> ActivePerl or nmake issue and not a bioperl problem as there are no > >> warnings using 'make test' in cygwin. I am in the midst of writing > >> up the steps for installing bioperl and bioperl-db using MySQL as the > >> relational DB with either ActivePerl or cygwin; I really don't have > >> much experience with postgreSQL, oracle, MsSQL (B. Wang's added > >> modules), etc., but I can't see any reason why they wouldn't work. > >> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l@portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Jan 18 12:30:36 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed Jan 18 12:26:53 2006 Subject: [Bioperl-l] bioperl-db working (for the moment) on Win32 In-Reply-To: <000101c61c52$f20e3070$15327e82@pyrimidine> References: <000101c61c52$f20e3070$15327e82@pyrimidine> Message-ID: On Jan 18, 2006, at 9:16 AM, Chris Fields wrote: > Also, what version of bioperl should be used with bioperl-db (I'm > adding it > as a dependency)? Will bioperl-1.4 do, or do we need 1.5.1 > (available at > the GMOD repository, http://www.gmod.org/ggb/ppm/)? > The recommendation is 1.5.1. v1.4 will largely work too, except that if you work with ontologies you might run into problems because there were fixes to the Ontology modules in Bioperl post-1.4. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From dnm_a at swbell.net Wed Jan 18 12:57:40 2006 From: dnm_a at swbell.net (David Messina) Date: Wed Jan 18 13:00:39 2006 Subject: [Bioperl-l] Context-sensitive alignment parameters In-Reply-To: <200601171512.k0HFCj8V012227@portal.open-bio.org> References: <200601171512.k0HFCj8V012227@portal.open-bio.org> Message-ID: Hi Jay, > When I am aligning sequences from other plants to Arabidopsis, the > introns are much less well-conserved than the exons, and this ought > to be the case for animals and other organisms too. > Does anyone use make any allowance for this, by setting gap and gap- > extension, or substitution matrix parameters in a context-sensitive > way? Is there an alignment method that can take this kind of thing > into account? Is it worth trying to take it into account anyway? I would do a local alignment (with e.g. Blast) first to find the segments of the genome that match. Then, I would realign each of the matching segments using a global alignment algorithm (e.g. needle from the EMBOSS package) to force the best alignment within each matching region. It's worth it if you're interested in looking at the overall conservation between the genomes or something like that. If however you're just interested in the exons, then it's easier to do the alignments with cDNA representations of the sequences from the other plants and align those to the Arabidopsis genomic sequence (using Blast). Hope this helps, Dave -- Dave Messina Informatics Analyst WashU Genome Sequencing Center dmessina@watson.wustl.edu 314-286-1825 From kaboroev at sfu.ca Wed Jan 18 12:15:28 2006 From: kaboroev at sfu.ca (Keith Boroevich) Date: Wed Jan 18 13:17:02 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm In-Reply-To: References: Message-ID: <1137604529.18560.14.camel@gotenks.zfighters> I'm not sure if this is related, but in the last 3 days my remote BLAST scripts have stop working. I have not modified the code in any way. The retrieve_blast() returns successful, and next_result() does return a "Bio::Search::Result::BlastResult=HASH(0x15ad8d0)" object but takes a long time to do so. However, next_hit returns undef. I'm not really sure how to approach this problem. Prior to 3 days ago the scripts worked perfectly returning a list of hits, their accession and significance. Keith On Tue, 2006-01-17 at 11:34 -0700, Barry Moore wrote: > Nagesh- > > Did you get this figured out? Your script works as is on my system. > You say temp.out is empty? What does you input sequence > (blastInput.txt) look like? > > Barry > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger > > Sent: Monday, January 16, 2006 2:54 PM > > To: Nagesh Chakka; bioperl-l@portal.open-bio.org > > Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > > > Nagesh Chakka wrote: > > > > >Hi All, > > >I was trying to setup a system to perform a remote blast on regular > > basis. I > > >thought this could be best achieved by using BioPerl module and came > > across > > >RemoteBlast.pm > > >I had modified the sample script "bp_remote_blast.pl" which takes a > file > > >containing single FASTA sequence as an input. Also I wanted the blast > > report > > >to be saved in a file for latter use and > > >modified the code as follows > > >I am using the latest version of Bioperl (1.5) on a Fedora platform. > > > >####################################################################### > > >print "$Bio::Root::Version::VERSION\n"; > > >use Bio::Tools::Run::RemoteBlast; > > >use strict; > > >my $prog = 'blastp'; > > >my $db = 'swissprot'; > > >my $e_val= '1e-10'; > > > > > >my @params = ( '-prog' => $prog, > > > '-data' => $db, > > > '-expect' => $e_val, > > > '-readmethod' => 'SearchIO' ); > > > > > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > >#change a paramter > > >$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > >[ORGN]'; > > > > > >#remove a parameter > > >delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > > > >my $v = 1; > > >#$v is just to turn on and off the messages > > > > > >my $r = $factory->submit_blast('blastInput.txt'); > > > > > >print STDERR "waiting..." if( $v > 0 ); > > >while ( my @rids = $factory->each_rid ) > > >{ > > > foreach my $rid ( @rids ) > > > { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) > > > { > > > if( $rc < 0 ) > > > { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } > > > else > > > { > > > print "RID $rid\n"; > > > $factory->save_output('temp.out'); > > > $factory->remove_rid($rid); > > > } > > > } > > >} > > > > > > >####################################################################### > ## > > ######## > > > > > >This script prints the RID and terminates immediately. Obviously the > > >output file created is empty as the program did not wait for getting > the > > >blast results from the RID. > > >Is there something I am doing wrong and what can I do for the program > to > > wait > > >until the results are ready to be printed to the output file. I could > not > > get > > >much information from the documentation and have no prior experience > with > > >Bioperl. > > >Thanks very much for your attention. > > >Regards > > >Nageshbi > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > hi nagesh, > > try this, should work, I had the same problem: > > > > ....................... > > ....................... > > > > else > > { > > print "RID $rid\n"; > > $factory->save_output('temp.out'); > > > > my $checkinput = $factory->file; > > open(my $fh,"<$checkinput") or die $!; > > while(<$fh>){ > > print; > > } > > close $fh; > > > > > > $factory->remove_rid($rid); > > } > > } > > } > > > > regards > > Hubert > > > > PS: are you using the composition based statistics parameter with your > > blast search? > > if yes, is it working? > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Wed Jan 18 13:05:49 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 18 13:38:21 2006 Subject: [Bioperl-l] Context-sensitive alignment parameters In-Reply-To: <200601171512.k0HFCj8V012227@portal.open-bio.org> References: <200601171512.k0HFCj8V012227@portal.open-bio.org> Message-ID: <3082CF90-556E-4560-B501-8C7C2A8C8663@duke.edu> WABA kind of does this with three different match states. -jason On Jan 17, 2006, at 10:09 AM, Jay Moore wrote: > Not strictly bioperl, but if anyone has any ideas, I would > appreciate the feedback. > > I am doing some comparative work between partially-sequenced plant > genomic DNA, and fully-sequenced Arabidopsis genome. > > When I am aligning sequences from other plants to Arabidopsis, the > introns are much less well-conserved than the exons, and this ought > to be the case > for animals and other organisms too. Does anyone use make any > allowance for this, by setting gap and gap-extension, or > substitution matrix parameters > in a context-sensitive way? Is there an alignment method that can > take this kind of thing into account? Is it worth trying to take > it into account > anyway? > > Just wondered if anyone has a take, or any information, on this. > > Jay Moore > Warwick HRI http://www.warwickhri.ac.uk > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From kaboroev at sfu.ca Wed Jan 18 12:55:09 2006 From: kaboroev at sfu.ca (Keith Boroevich) Date: Wed Jan 18 13:56:11 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm In-Reply-To: References: Message-ID: <1137606909.18560.17.camel@gotenks.zfighters> I'm not sure if this is related, but in the last 3 days my remote BLAST scripts have stop working. I have not modified the code in any way. The retrieve_blast() returns successful, and next_result() does return a "Bio::Search::Result::BlastResult=HASH(0x15ad8d0)" object but takes a long time to do so. However, next_hit returns undef. I'm not really sure how to approach this problem. Prior to 3 days ago the scripts worked perfectly returning a list of hits, their accession and significance. Keith On Tue, 2006-01-17 at 11:34 -0700, Barry Moore wrote: > Nagesh- > > Did you get this figured out? Your script works as is on my system. > You say temp.out is empty? What does you input sequence > (blastInput.txt) look like? > > Barry > > > -----Original Message----- > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger > > Sent: Monday, January 16, 2006 2:54 PM > > To: Nagesh Chakka; bioperl-l@portal.open-bio.org > > Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > > > Nagesh Chakka wrote: > > > > >Hi All, > > >I was trying to setup a system to perform a remote blast on regular > > basis. I > > >thought this could be best achieved by using BioPerl module and came > > across > > >RemoteBlast.pm > > >I had modified the sample script "bp_remote_blast.pl" which takes a > file > > >containing single FASTA sequence as an input. Also I wanted the blast > > report > > >to be saved in a file for latter use and > > >modified the code as follows > > >I am using the latest version of Bioperl (1.5) on a Fedora platform. > > > >####################################################################### > > >print "$Bio::Root::Version::VERSION\n"; > > >use Bio::Tools::Run::RemoteBlast; > > >use strict; > > >my $prog = 'blastp'; > > >my $db = 'swissprot'; > > >my $e_val= '1e-10'; > > > > > >my @params = ( '-prog' => $prog, > > > '-data' => $db, > > > '-expect' => $e_val, > > > '-readmethod' => 'SearchIO' ); > > > > > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > >#change a paramter > > >$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > >[ORGN]'; > > > > > >#remove a parameter > > >delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > > > >my $v = 1; > > >#$v is just to turn on and off the messages > > > > > >my $r = $factory->submit_blast('blastInput.txt'); > > > > > >print STDERR "waiting..." if( $v > 0 ); > > >while ( my @rids = $factory->each_rid ) > > >{ > > > foreach my $rid ( @rids ) > > > { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) > > > { > > > if( $rc < 0 ) > > > { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } > > > else > > > { > > > print "RID $rid\n"; > > > $factory->save_output('temp.out'); > > > $factory->remove_rid($rid); > > > } > > > } > > >} > > > > > > >####################################################################### > ## > > ######## > > > > > >This script prints the RID and terminates immediately. Obviously the > > >output file created is empty as the program did not wait for getting > the > > >blast results from the RID. > > >Is there something I am doing wrong and what can I do for the program > to > > wait > > >until the results are ready to be printed to the output file. I could > not > > get > > >much information from the documentation and have no prior experience > with > > >Bioperl. > > >Thanks very much for your attention. > > >Regards > > >Nageshbi > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@portal.open-bio.org > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > hi nagesh, > > try this, should work, I had the same problem: > > > > ....................... > > ....................... > > > > else > > { > > print "RID $rid\n"; > > $factory->save_output('temp.out'); > > > > my $checkinput = $factory->file; > > open(my $fh,"<$checkinput") or die $!; > > while(<$fh>){ > > print; > > } > > close $fh; > > > > > > $factory->remove_rid($rid); > > } > > } > > } > > > > regards > > Hubert > > > > PS: are you using the composition based statistics parameter with your > > blast search? > > if yes, is it working? > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Jan 18 16:17:49 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed Jan 18 16:14:03 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm In-Reply-To: <1137606909.18560.17.camel@gotenks.zfighters> Message-ID: <001401c61c74$a274b760$15327e82@pyrimidine> I have had the same problem using a script I wrote. It worked until ~4 days ago. Luckily, I had saved a copy of some of my old searches in a temp folder so I can compare them. I noticed that if I just save the output using: $factory->save_output('temp.out'); it works (just like Barry's script), but if I have the following in a loop (like in RemoteBlast POD), it craps out: while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); # if RID is not present if( !ref($rc) ) { # remove if RID is bad (error) if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { # RID is returned my $result = $rc->next_result(); # save the output my $filename = $result->query_name()."\.blastp"; $factory->save_output($filename); # remove RID from list $factory->remove_rid($rid); ... When I change the following: my $filename = $result->query_name()."\.blastp"; to my $filename = "temp.blastp"; and comment out the 'my $result = $rc->next_result()' line, it works again, so possibly SearchIO? The only difference I noticed is that older output has this: _______________________________________________________________________ BLASTP 2.2.12 [Aug-07-2005] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch?ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. RID: 1131470802-26518-118666159798.BLASTQ3 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples 3,023,944 sequences; 1,040,428,944 total letters Query= NP_249094 transcriptional regulator PyrR [Pseudomonas aeruginosa PAO1]. (170 letters) .... _______________________________________________________________________ And new output has this: _______________________________________________________________________ BLASTP 2.2.13 [Nov-27-2005] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. RID: 1137614458-7828-16730336973.BLASTQ4 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples 3,228,386 sequences; 1,108,137,318 total letters Query= NP_249094 pyrimidine regulatory protein PyrR [Pseudomonas aeruginosa PAO1]. Length=170 .... _______________________________________________________________________ There is a change in the line for the length. Is this enough to break SearchIO::Blast? I think Jason is right; maybe NCBI has messed with text output and it's now breaking the BLAST parser: http://portal.open-bio.org/pipermail/bioperl-l/2005-November/020067.html I may try switching over to XML output to see what happens. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Keith Boroevich > Sent: Wednesday, January 18, 2006 11:55 AM > To: kaboroev@sfu.ca > Cc: bioperl-l@portal.open-bio.org > Subject: RE: [Bioperl-l] Trouble using RemoteBlast.pm > > I'm not sure if this is related, but in the last 3 days my remote BLAST > scripts have stop working. I have not modified the code in any way. > The retrieve_blast() returns successful, and next_result() does return a > "Bio::Search::Result::BlastResult=HASH(0x15ad8d0)" object but takes a > long time to do so. However, next_hit returns undef. I'm not really > sure how to approach this problem. Prior to 3 days ago the scripts > worked perfectly returning a list of hits, their accession and > significance. > > Keith > > > On Tue, 2006-01-17 at 11:34 -0700, Barry Moore wrote: > > Nagesh- > > > > Did you get this figured out? Your script works as is on my system. > > You say temp.out is empty? What does you input sequence > > (blastInput.txt) look like? > > > > Barry > > > > > -----Original Message----- > > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > > bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger > > > Sent: Monday, January 16, 2006 2:54 PM > > > To: Nagesh Chakka; bioperl-l@portal.open-bio.org > > > Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > > > > > Nagesh Chakka wrote: > > > > > > >Hi All, > > > >I was trying to setup a system to perform a remote blast on regular > > > basis. I > > > >thought this could be best achieved by using BioPerl module and came > > > across > > > >RemoteBlast.pm > > > >I had modified the sample script "bp_remote_blast.pl" which takes a > > file > > > >containing single FASTA sequence as an input. Also I wanted the blast > > > report > > > >to be saved in a file for latter use and > > > >modified the code as follows > > > >I am using the latest version of Bioperl (1.5) on a Fedora platform. > > > > > >####################################################################### > > > >print "$Bio::Root::Version::VERSION\n"; > > > >use Bio::Tools::Run::RemoteBlast; > > > >use strict; > > > >my $prog = 'blastp'; > > > >my $db = 'swissprot'; > > > >my $e_val= '1e-10'; > > > > > > > >my @params = ( '-prog' => $prog, > > > > '-data' => $db, > > > > '-expect' => $e_val, > > > > '-readmethod' => 'SearchIO' ); > > > > > > > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > > >#change a paramter > > > >$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > > >[ORGN]'; > > > > > > > >#remove a parameter > > > >delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > > > > > >my $v = 1; > > > >#$v is just to turn on and off the messages > > > > > > > >my $r = $factory->submit_blast('blastInput.txt'); > > > > > > > >print STDERR "waiting..." if( $v > 0 ); > > > >while ( my @rids = $factory->each_rid ) > > > >{ > > > > foreach my $rid ( @rids ) > > > > { > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) > > > > { > > > > if( $rc < 0 ) > > > > { > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ( $v > 0 ); > > > > sleep 5; > > > > } > > > > else > > > > { > > > > print "RID $rid\n"; > > > > $factory->save_output('temp.out'); > > > > $factory->remove_rid($rid); > > > > } > > > > } > > > >} > > > > > > > > > >####################################################################### > > ## > > > ######## > > > > > > > >This script prints the RID and terminates immediately. Obviously the > > > >output file created is empty as the program did not wait for getting > > the > > > >blast results from the RID. > > > >Is there something I am doing wrong and what can I do for the program > > to > > > wait > > > >until the results are ready to be printed to the output file. I could > > not > > > get > > > >much information from the documentation and have no prior experience > > with > > > >Bioperl. > > > >Thanks very much for your attention. > > > >Regards > > > >Nageshbi > > > >_______________________________________________ > > > >Bioperl-l mailing list > > > >Bioperl-l@portal.open-bio.org > > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > hi nagesh, > > > try this, should work, I had the same problem: > > > > > > ....................... > > > ....................... > > > > > > else > > > { > > > print "RID $rid\n"; > > > $factory->save_output('temp.out'); > > > > > > my $checkinput = $factory->file; > > > open(my $fh,"<$checkinput") or die $!; > > > while(<$fh>){ > > > print; > > > } > > > close $fh; > > > > > > > > > $factory->remove_rid($rid); > > > } > > > } > > > } > > > > > > regards > > > Hubert > > > > > > PS: are you using the composition based statistics parameter with your > > > blast search? > > > if yes, is it working? > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Wed Jan 18 16:30:02 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed Jan 18 17:01:09 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm In-Reply-To: <001401c61c74$a274b760$15327e82@pyrimidine> References: <001401c61c74$a274b760$15327e82@pyrimidine> Message-ID: You may need to start requesting XML instead of plain text - NCBI may have finally done what they warned about (http://bioperl.org/ pipermail/bioperl-l/2005-September/019687.html). You can see information here about getting XML. http://bioperl.open-bio.org/news/2005/11/06/getting-blastxml-using- remoteblast/ http://bioperl.open-bio.org/wiki/Module:Bio::Tools::Run::RemoteBlast http://bioperl.open-bio.org/wiki/NCBI_Blast_email We'll officially announce the new news and wiki site more at the end of the month when we switch to permanent URL but I suspect this question needs a pointer. Feel free to add this question and answer to the FAQ as well http://bioperl.open-bio.org/wiki/FAQ -jason On Jan 18, 2006, at 4:17 PM, Chris Fields wrote: > I have had the same problem using a script I wrote. It worked > until ~4 days > ago. Luckily, I had saved a copy of some of my old searches in a temp > folder so I can compare them. > > I noticed that if I just save the output using: > > $factory->save_output('temp.out'); > > it works (just like Barry's script), but if I have the following in > a loop > (like in RemoteBlast POD), it craps out: > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > # if RID is not present > if( !ref($rc) ) { > # remove if RID is bad (error) > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { # RID is returned > my $result = $rc->next_result(); > # save the output > my $filename = $result->query_name()."\.blastp"; > $factory->save_output($filename); > # remove RID from list > $factory->remove_rid($rid); > ... > > > > When I change the following: > > my $filename = $result->query_name()."\.blastp"; > > to > > my $filename = "temp.blastp"; > > and comment out the 'my $result = $rc->next_result()' line, it > works again, > so possibly SearchIO? > > The only difference I noticed is that older output has this: > ______________________________________________________________________ > _ > > BLASTP 2.2.12 [Aug-07-2005] > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > Sch?ffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman > (1997), "Gapped BLAST and PSI-BLAST: a new generation of > protein database search programs", Nucleic Acids Res. 25:3389-3402. > > RID: 1131470802-26518-118666159798.BLASTQ3 > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > 3,023,944 sequences; 1,040,428,944 total letters > Query= NP_249094 transcriptional regulator PyrR [Pseudomonas > aeruginosa > PAO1]. > (170 letters) > .... > > ______________________________________________________________________ > _ > > And new output has this: > ______________________________________________________________________ > _ > BLASTP 2.2.13 [Nov-27-2005] > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch? > ?ffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman > (1997), "Gapped BLAST and PSI-BLAST: a new generation of > protein database search programs", Nucleic Acids Res. 25:3389-3402. > > RID: 1137614458-7828-16730336973.BLASTQ4 > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > 3,228,386 sequences; 1,108,137,318 total letters > Query= NP_249094 pyrimidine regulatory protein PyrR [Pseudomonas > aeruginosa > PAO1]. > Length=170 > .... > ______________________________________________________________________ > _ > > > There is a change in the line for the length. Is this enough to break > SearchIO::Blast? > > I think Jason is right; maybe NCBI has messed with text output and > it's now > breaking the BLAST parser: > > http://portal.open-bio.org/pipermail/bioperl-l/2005-November/ > 020067.html > > I may try switching over to XML output to see what happens. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > >> -----Original Message----- >> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >> bounces@portal.open-bio.org] On Behalf Of Keith Boroevich >> Sent: Wednesday, January 18, 2006 11:55 AM >> To: kaboroev@sfu.ca >> Cc: bioperl-l@portal.open-bio.org >> Subject: RE: [Bioperl-l] Trouble using RemoteBlast.pm >> >> I'm not sure if this is related, but in the last 3 days my remote >> BLAST >> scripts have stop working. I have not modified the code in any way. >> The retrieve_blast() returns successful, and next_result() does >> return a >> "Bio::Search::Result::BlastResult=HASH(0x15ad8d0)" object but takes a >> long time to do so. However, next_hit returns undef. I'm not really >> sure how to approach this problem. Prior to 3 days ago the scripts >> worked perfectly returning a list of hits, their accession and >> significance. >> >> Keith >> >> >> On Tue, 2006-01-17 at 11:34 -0700, Barry Moore wrote: >>> Nagesh- >>> >>> Did you get this figured out? Your script works as is on my system. >>> You say temp.out is empty? What does you input sequence >>> (blastInput.txt) look like? >>> >>> Barry >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >>>> bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger >>>> Sent: Monday, January 16, 2006 2:54 PM >>>> To: Nagesh Chakka; bioperl-l@portal.open-bio.org >>>> Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm >>>> >>>> Nagesh Chakka wrote: >>>> >>>>> Hi All, >>>>> I was trying to setup a system to perform a remote blast on >>>>> regular >>>> basis. I >>>>> thought this could be best achieved by using BioPerl module and >>>>> came >>>> across >>>>> RemoteBlast.pm >>>>> I had modified the sample script "bp_remote_blast.pl" which >>>>> takes a >>> file >>>>> containing single FASTA sequence as an input. Also I wanted the >>>>> blast >>>> report >>>>> to be saved in a file for latter use and >>>>> modified the code as follows >>>>> I am using the latest version of Bioperl (1.5) on a Fedora >>>>> platform. >>>> >>>> ################################################################### >>>> #### >>>>> print "$Bio::Root::Version::VERSION\n"; >>>>> use Bio::Tools::Run::RemoteBlast; >>>>> use strict; >>>>> my $prog = 'blastp'; >>>>> my $db = 'swissprot'; >>>>> my $e_val= '1e-10'; >>>>> >>>>> my @params = ( '-prog' => $prog, >>>>> '-data' => $db, >>>>> '-expect' => $e_val, >>>>> '-readmethod' => 'SearchIO' ); >>>>> >>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>> >>>>> #change a paramter >>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo >>>>> sapiens >>>>> [ORGN]'; >>>>> >>>>> #remove a parameter >>>>> delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; >>>>> >>>>> my $v = 1; >>>>> #$v is just to turn on and off the messages >>>>> >>>>> my $r = $factory->submit_blast('blastInput.txt'); >>>>> >>>>> print STDERR "waiting..." if( $v > 0 ); >>>>> while ( my @rids = $factory->each_rid ) >>>>> { >>>>> foreach my $rid ( @rids ) >>>>> { >>>>> my $rc = $factory->retrieve_blast($rid); >>>>> if( !ref($rc) ) >>>>> { >>>>> if( $rc < 0 ) >>>>> { >>>>> $factory->remove_rid($rid); >>>>> } >>>>> print STDERR "." if ( $v > 0 ); >>>>> sleep 5; >>>>> } >>>>> else >>>>> { >>>>> print "RID $rid\n"; >>>>> $factory->save_output('temp.out'); >>>>> $factory->remove_rid($rid); >>>>> } >>>>> } >>>>> } >>>>> >>>> >>>> ################################################################### >>>> #### >>> ## >>>> ######## >>>>> >>>>> This script prints the RID and terminates immediately. >>>>> Obviously the >>>>> output file created is empty as the program did not wait for >>>>> getting >>> the >>>>> blast results from the RID. >>>>> Is there something I am doing wrong and what can I do for the >>>>> program >>> to >>>> wait >>>>> until the results are ready to be printed to the output file. I >>>>> could >>> not >>>> get >>>>> much information from the documentation and have no prior >>>>> experience >>> with >>>>> Bioperl. >>>>> Thanks very much for your attention. >>>>> Regards >>>>> Nageshbi >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>> hi nagesh, >>>> try this, should work, I had the same problem: >>>> >>>> ....................... >>>> ....................... >>>> >>>> else >>>> { >>>> print "RID $rid\n"; >>>> $factory->save_output('temp.out'); >>>> >>>> my $checkinput = $factory->file; >>>> open(my $fh,"<$checkinput") or die $!; >>>> while(<$fh>){ >>>> print; >>>> } >>>> close $fh; >>>> >>>> >>>> $factory->remove_rid($rid); >>>> } >>>> } >>>> } >>>> >>>> regards >>>> Hubert >>>> >>>> PS: are you using the composition based statistics parameter >>>> with your >>>> blast search? >>>> if yes, is it working? >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l@portal.open-bio.org >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From nagesh.chakka at anu.edu.au Wed Jan 18 20:37:28 2006 From: nagesh.chakka at anu.edu.au (Nagesh) Date: Wed Jan 18 20:34:08 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm In-Reply-To: References: Message-ID: <1137634648.5305.36.camel@vogon> Thanks very much to all specially to Barry and Hubert for their time in answering my query. Some updates into my problem. I have performed some diagnostics tests and writing below my observations. First of all, the problem in the code was that it was not waiting for the results to be ready for writing it to the output file. So I wanted to check whether the condition "if( !ref($rc) )" is ever satisfied and I printed out the $rc value which was some thing like "Bio::SearchIO:: blast=HASH(0x9010370)". When I had looked at the Bioperl documentation for RemoteBlast.pm, the value for $rc in "$rc = $factory->retrieve_blast ($rid);" should either return 0 or 1. I am not able to understand whether what I am getting is right. Secondly, I had manually forced the script to wait between submit_blast, retrieve_blast and save_output by using sleep with values ranging from 30 to 600. None of them where successful in saving the output. When sleep (600) is between submit_blast and retrieve_blast, the following is printed onto std output (shown below is part of the output) with output file still empty.

Request ID 1137626804-16566-100302560340.BLASTQ4
StatusSearching
Submitted atWed Jan 18 18:26:44 2006
Current timeWed Jan 18 18:36:46 2006
Time since submission 00:10:01


This page will be automatically updated in 10 seconds until search is done
When sleep (600) is between retrieve_blast and save_output, the following is printed with nothing written to output file.

Request ID 1137632221-28820-85178967709.BLASTQ1
StatusSearching
Submitted atWed Jan 18 19:57:01 2006
Current timeWed Jan 18 19:57:03 2006
Time since submission 00:00:01


This page will be automatically updated in 10 seconds until search is done
Please note the difference in time since submission. Lastly, I had printed out the request ID and manually paused the script by using between submit_blast and retrieve_blast. The idea was to check the status of the job online through the NCBI website. When the results where ready, I made the script to proceed further and was able to save the desired results to the file. I am puzzled with this observation as I am not understanding why manually formating the results online helps in getting the results. I am basically a molecular biologist and trying hard to solve this computational stuff, so there might be some trivial issues according to you computer wiz :) Barry suggested me to use perl debugger which I will try to use. Thanks for your attention. Below is the code which was being tested. ######################################################################## use strict; use warnings; use Bio::Tools::Run::RemoteBlast; print "$Bio::Root::Version::VERSION\n"; my $prog = 'blastp'; my $db = 'swissprot'; my $e_val= '1e-10'; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'SearchIO' ); my $factory = Bio::Tools::Run::RemoteBlast->new(@params); #change a paramter $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens [ORGN]'; #remove a parameter delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; my $v = 1; #$v is just to turn on and off the messages my $r = $factory->submit_blast('blastInput.txt'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { print "RID $rid\n"; #; #sleep 600; my $rc = $factory->retrieve_blast($rid); print "RC $rc\n"; if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { sleep 600; $factory->save_output('temp.out'); my $checkinput = $factory->file; open(my $fh,"<$checkinput") or die $!; while(<$fh>) { print; } close $fh; $factory->remove_rid($rid); } } } ######################################################################## On Tue, 2006-01-17 at 16:03 -0700, Barry Moore wrote: > Nagesh, > > Attached is an input file, script and output. These work for me, and I > think they are the same that you are using. Have a look and see if you > can find any differences that might be causing you problem. Other than > that I don't know what to tell you. If you are familiar with the perl > debugger you (and if you're not, now's probably a good time to become > familiar with it) you should step through you script and be sure that > all of you're objects are getting defined when they are supposed to be. > That can often help narrow down the problem. > > Barry > > > -----Original Message----- > > From: Nagesh Chakka [mailto:nagesh.chakka@anu.edu.au] > > Sent: Tuesday, January 17, 2006 1:57 PM > > To: Barry Moore > > Cc: Hubert Prielinger; bioperl-l@bioperl.org > > Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > > > Bi Barry, > > With the help of Hubert, I further modified the script but still have > the > > same > > problem. The problem is that from the point of submitting the blast > query, > > the script does not wait until the blast results are ready for > retrieval > > and > > event of submission is immediately followed by retrieving and saving > the > > output. Since the results will not be ready (about a sec) this fast, > the > > output created is blank. I am able to retrieve the results online > using > > the > > RID which I am making the script to print. > > So my main problem is making the program to wait after submitting the > > result. > > My input file has a single fasta sequence which I have pasted below. > > Its interesting to note that the script works on your system. Is it > > creating > > an output file with the blast report? > > Thanks very much for your attention. > > Regards > > Nagesh > > > > blastInput.txt > > >MusDpl > > > MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAFIKQGRKLDIDFG > AE > > GNRYYA > > > ANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAEFSREKQDSKLHQRVLWRLIKEICSAKHCDFWL > ER > > GAAL > > RVAVDQPAMVCLLGFVWFIVK > > > > On Wednesday 18 January 2006 05:34, Barry Moore wrote: > > > Nagesh- > > > > > > Did you get this figured out? Your script works as is on my system. > > > You say temp.out is empty? What does you input sequence > > > (blastInput.txt) look like? > > > > > > Barry > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > > > bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger > > > > Sent: Monday, January 16, 2006 2:54 PM > > > > To: Nagesh Chakka; bioperl-l@portal.open-bio.org > > > > Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > > > > > > > Nagesh Chakka wrote: > > > > >Hi All, > > > > >I was trying to setup a system to perform a remote blast on > regular > > > > > > > > basis. I > > > > > > > > >thought this could be best achieved by using BioPerl module and > came > > > > > > > > across > > > > > > > > >RemoteBlast.pm > > > > >I had modified the sample script "bp_remote_blast.pl" which takes > a > > > > > > file > > > > > > > >containing single FASTA sequence as an input. Also I wanted the > blast > > > > > > > > report > > > > > > > > >to be saved in a file for latter use and > > > > >modified the code as follows > > > > >I am using the latest version of Bioperl (1.5) on a Fedora > platform. > > > > > > > > >####################################################################### > > > > > > > > >print "$Bio::Root::Version::VERSION\n"; > > > > >use Bio::Tools::Run::RemoteBlast; > > > > >use strict; > > > > >my $prog = 'blastp'; > > > > >my $db = 'swissprot'; > > > > >my $e_val= '1e-10'; > > > > > > > > > >my @params = ( '-prog' => $prog, > > > > > '-data' => $db, > > > > > '-expect' => $e_val, > > > > > '-readmethod' => 'SearchIO' ); > > > > > > > > > >my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > > > > >#change a paramter > > > > >$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo > sapiens > > > > >[ORGN]'; > > > > > > > > > >#remove a parameter > > > > >delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > > > > > > > >my $v = 1; > > > > >#$v is just to turn on and off the messages > > > > > > > > > >my $r = $factory->submit_blast('blastInput.txt'); > > > > > > > > > >print STDERR "waiting..." if( $v > 0 ); > > > > >while ( my @rids = $factory->each_rid ) > > > > >{ > > > > > foreach my $rid ( @rids ) > > > > > { > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > if( !ref($rc) ) > > > > > { > > > > > if( $rc < 0 ) > > > > > { > > > > > $factory->remove_rid($rid); > > > > > } > > > > > print STDERR "." if ( $v > 0 ); > > > > > sleep 5; > > > > > } > > > > > else > > > > > { > > > > > print "RID $rid\n"; > > > > > $factory->save_output('temp.out'); > > > > > $factory->remove_rid($rid); > > > > > } > > > > > } > > > > >} > > > > > > > > >####################################################################### > > > > > > ## > > > > > > > ######## > > > > > > > > >This script prints the RID and terminates immediately. Obviously > the > > > > >output file created is empty as the program did not wait for > getting > > > > > > the > > > > > > > >blast results from the RID. > > > > >Is there something I am doing wrong and what can I do for the > program > > > > > > to > > > > > > > wait > > > > > > > > >until the results are ready to be printed to the output file. I > could > > > > > > not > > > > > > > get > > > > > > > > >much information from the documentation and have no prior > experience > > > > > > with > > > > > > > >Bioperl. > > > > >Thanks very much for your attention. > > > > >Regards > > > > >Nageshbi > > > > >_______________________________________________ > > > > >Bioperl-l mailing list > > > > >Bioperl-l@portal.open-bio.org > > > > >http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > hi nagesh, > > > > try this, should work, I had the same problem: > > > > > > > > ....................... > > > > ....................... > > > > > > > > else > > > > { > > > > print "RID $rid\n"; > > > > $factory->save_output('temp.out'); > > > > > > > > my $checkinput = $factory->file; > > > > open(my $fh,"<$checkinput") or die $!; > > > > while(<$fh>){ > > > > print; > > > > } > > > > close $fh; > > > > > > > > > > > > $factory->remove_rid($rid); > > > > } > > > > } > > > > } > > > > > > > > regards > > > > Hubert > > > > > > > > PS: are you using the composition based statistics parameter with > your > > > > blast search? > > > > if yes, is it working? > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Jan 18 23:04:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed Jan 18 23:00:59 2006 Subject: [Bioperl-l] XML output from RemoteBlast Message-ID: Is there any known way to save XML-formatted BLAST queries from RemoteBlast? Changing the FORMAT_TYPE in the retrieval header to anything other than 'Text' gives a blank output file. Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From barry.moore at genetics.utah.edu Thu Jan 19 00:15:06 2006 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Thu Jan 19 00:11:02 2006 Subject: [Bioperl-l] Trouble using RemoteBlast.pm In-Reply-To: <1137634648.5305.36.camel@vogon> References: <1137634648.5305.36.camel@vogon> Message-ID: <88489B2C-0C4B-46B2-ACB6-247990E30AB6@genetics.utah.edu> Nagesh, That does sound odd. What version of bioperl are you using? I'm guessing 1.4? If the answer is anything but 1.5 something, then I suggest you should upgrade before going any further. You will also want to follow the current thread by about parsing XML formatted blast reports. I don't think this is your problem right now, but eventually you'll have a problem if you aren't parsing XML format as discussed in that post. I've added some more detail below if you are having the problem with 1.5 try some debugging. Here's what's going on (or should be going on) in your script, and some suggestions for using the debugger. #This next line hits the NCBI server, and if it gets a blast report in return parses it, and returns a Bio::Tools::Blast object. If there was no report you get 0, and if there was an error you get -1. my $rc = $factory->retrieve_blast($rid); print "RC $rc\n"; #This if statement is checking to see if the server has NOT returned a report yet. If it did then $rc should be an object and ref $rc will return 'Bio::SearchIIO::blast'. If $rc is not an object (i.e. you got no report) then ref $rc returns undef. if( !ref($rc) ) { #If you got here then you got no report from NCBI server yet, and so the next if check is you got -1 meaning there was an error. On error delete this RID cause it's no good. if( $rc < 0 ) { $factory->remove_rid($rid); } #Print a dot on the screen in leu of music to keep the user entertained while they wait. print STDERR "." if ( $v > 0 ); #Take a nap so you don't piss off NCBI sys admin! sleep 5; } #Getting here means that $rc was an object, so we've got a report. Go ahead and save it. else { sleep 600; #Obviously writing your output file. $factory->save_output('temp.out'); my $checkinput = $factory->file; open(my $fh,"<$checkinput") or die $!; while(<$fh>) { print; } close $fh; $factory->remove_rid($rid); run your script in the debugger like this: perl -d your_script.pl Step forward one line at a time by typing 'n'. When you get just past my $rc = $factory->retrieve_blast($rid); type 'x $rc' You should get 0, -1 or 'Bio::SearchIO::blast' Keep stepping forward with 'n'. If you get 0 you should loop back to retrieve_blast after a sleep. If you get -1 you should end your script - you got an error (What was it?) If you get an Bio::SearchIO::blast object then you should be writing a temp.out Barry On Jan 18, 2006, at 6:37 PM, Nagesh wrote: > Thanks very much to all specially to Barry and Hubert for their > time in > answering my query. Some updates into my problem. > > I have performed some diagnostics tests and writing below my > observations. > > First of all, the problem in the code was that it was not waiting for > the results to be ready for writing it to the output file. So I wanted > to check whether the condition "if( !ref($rc) )" is ever satisfied > and I > printed out the $rc value which was some thing like "Bio::SearchIO:: > blast=HASH(0x9010370)". When I had looked at the Bioperl documentation > for RemoteBlast.pm, the value for $rc in "$rc = $factory- > >retrieve_blast > ($rid);" should either return 0 or 1. I am not able to understand > whether what I am getting is right. > > Secondly, I had manually forced the script to wait between > submit_blast, > retrieve_blast and save_output by using sleep with values ranging from > 30 to 600. None of them where successful in saving the output. > > When sleep (600) is between submit_blast and retrieve_blast, the > following is printed onto std output (shown below is part of the > output) > with output file still empty. > >

> > > > > > >

Request ID 1137626804-16566-100302560340.BLASTQ4 b>
StatusSearching
Submitted atWed Jan 18 18:26:44 2006
Current timeWed Jan 18 18:36:46 2006
Time since submission00:10:01
>


This page will be automatically updated in 10 seconds > until search is done
> > When sleep (600) is between retrieve_blast and save_output, the > following is printed with nothing written to output file. > >

> > > > > > >

Request ID 1137632221-28820-85178967709.BLASTQ1 b>
StatusSearching
Submitted atWed Jan 18 19:57:01 2006
Current timeWed Jan 18 19:57:03 2006
Time since submission00:00:01
>


This page will be automatically updated in 10 seconds > until search is done
> > Please note the difference in time since submission. > > Lastly, I had printed out the request ID and manually paused the > script > by using between submit_blast and retrieve_blast. The idea was > to check the status of the job online through the NCBI website. > When the > results where ready, I made the script to proceed further and was able > to save the desired results to the file. I am puzzled with this > observation as I am not understanding why manually formating the > results > online helps in getting the results. > I am basically a molecular biologist and trying hard to solve this > computational stuff, so there might be some trivial issues > according to > you computer wiz :) > > Barry suggested me to use perl debugger which I will try to use. > > Thanks for your attention. > > Below is the code which was being tested. > > ###################################################################### > ## > > use strict; > use warnings; > use Bio::Tools::Run::RemoteBlast; > > print "$Bio::Root::Version::VERSION\n"; > my $prog = 'blastp'; > my $db = 'swissprot'; > my $e_val= '1e-10'; > > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > #change a paramter > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > [ORGN]'; > > #remove a parameter > delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > my $v = 1; > #$v is just to turn on and off the messages > > my $r = $factory->submit_blast('blastInput.txt'); > > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) > { > foreach my $rid ( @rids ) > { > > print "RID $rid\n"; > > #; > #sleep 600; > my $rc = $factory->retrieve_blast($rid); > > print "RC $rc\n"; > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } > else > { > sleep 600; > $factory->save_output('temp.out'); > my $checkinput = $factory->file; > open(my $fh,"<$checkinput") or die $!; > while(<$fh>) > { > print; > } > close $fh; > $factory->remove_rid($rid); > } > } > } > > ###################################################################### > ## > > > On Tue, 2006-01-17 at 16:03 -0700, Barry Moore wrote: >> Nagesh, >> >> Attached is an input file, script and output. These work for me, >> and I >> think they are the same that you are using. Have a look and see >> if you >> can find any differences that might be causing you problem. Other >> than >> that I don't know what to tell you. If you are familiar with the >> perl >> debugger you (and if you're not, now's probably a good time to become >> familiar with it) you should step through you script and be sure that >> all of you're objects are getting defined when they are supposed >> to be. >> That can often help narrow down the problem. >> >> Barry >> >>> -----Original Message----- >>> From: Nagesh Chakka [mailto:nagesh.chakka@anu.edu.au] >>> Sent: Tuesday, January 17, 2006 1:57 PM >>> To: Barry Moore >>> Cc: Hubert Prielinger; bioperl-l@bioperl.org >>> Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm >>> >>> Bi Barry, >>> With the help of Hubert, I further modified the script but still >>> have >> the >>> same >>> problem. The problem is that from the point of submitting the blast >> query, >>> the script does not wait until the blast results are ready for >> retrieval >>> and >>> event of submission is immediately followed by retrieving and saving >> the >>> output. Since the results will not be ready (about a sec) this fast, >> the >>> output created is blank. I am able to retrieve the results online >> using >>> the >>> RID which I am making the script to print. >>> So my main problem is making the program to wait after >>> submitting the >>> result. >>> My input file has a single fasta sequence which I have pasted below. >>> Its interesting to note that the script works on your system. Is it >>> creating >>> an output file with the blast report? >>> Thanks very much for your attention. >>> Regards >>> Nagesh >>> >>> blastInput.txt >>>> MusDpl >>> >> MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAFIKQGRKLDI >> DFG >> AE >>> GNRYYA >>> >> ANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAEFSREKQDSKLHQRVLWRLIKEICSAKHCD >> FWL >> ER >>> GAAL >>> RVAVDQPAMVCLLGFVWFIVK >>> >>> On Wednesday 18 January 2006 05:34, Barry Moore wrote: >>>> Nagesh- >>>> >>>> Did you get this figured out? Your script works as is on my >>>> system. >>>> You say temp.out is empty? What does you input sequence >>>> (blastInput.txt) look like? >>>> >>>> Barry >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- >>>>> bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger >>>>> Sent: Monday, January 16, 2006 2:54 PM >>>>> To: Nagesh Chakka; bioperl-l@portal.open-bio.org >>>>> Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm >>>>> >>>>> Nagesh Chakka wrote: >>>>>> Hi All, >>>>>> I was trying to setup a system to perform a remote blast on >> regular >>>>> >>>>> basis. I >>>>> >>>>>> thought this could be best achieved by using BioPerl module and >> came >>>>> >>>>> across >>>>> >>>>>> RemoteBlast.pm >>>>>> I had modified the sample script "bp_remote_blast.pl" which takes >> a >>>> >>>> file >>>> >>>>>> containing single FASTA sequence as an input. Also I wanted the >> blast >>>>> >>>>> report >>>>> >>>>>> to be saved in a file for latter use and >>>>>> modified the code as follows >>>>>> I am using the latest version of Bioperl (1.5) on a Fedora >> platform. >>>>> >>>> >>> #################################################################### >>> ### >>>>> >>>>>> print "$Bio::Root::Version::VERSION\n"; >>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>> use strict; >>>>>> my $prog = 'blastp'; >>>>>> my $db = 'swissprot'; >>>>>> my $e_val= '1e-10'; >>>>>> >>>>>> my @params = ( '-prog' => $prog, >>>>>> '-data' => $db, >>>>>> '-expect' => $e_val, >>>>>> '-readmethod' => 'SearchIO' ); >>>>>> >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> >>>>>> #change a paramter >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo >> sapiens >>>>>> [ORGN]'; >>>>>> >>>>>> #remove a parameter >>>>>> delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; >>>>>> >>>>>> my $v = 1; >>>>>> #$v is just to turn on and off the messages >>>>>> >>>>>> my $r = $factory->submit_blast('blastInput.txt'); >>>>>> >>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>> while ( my @rids = $factory->each_rid ) >>>>>> { >>>>>> foreach my $rid ( @rids ) >>>>>> { >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> if( !ref($rc) ) >>>>>> { >>>>>> if( $rc < 0 ) >>>>>> { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } >>>>>> else >>>>>> { >>>>>> print "RID $rid\n"; >>>>>> $factory->save_output('temp.out'); >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> } >>>>>> } >>>>> >>>> >>> #################################################################### >>> ### >>>> >>>> ## >>>> >>>>> ######## >>>>> >>>>>> This script prints the RID and terminates immediately. Obviously >> the >>>>>> output file created is empty as the program did not wait for >> getting >>>> >>>> the >>>> >>>>>> blast results from the RID. >>>>>> Is there something I am doing wrong and what can I do for the >> program >>>> >>>> to >>>> >>>>> wait >>>>> >>>>>> until the results are ready to be printed to the output file. I >> could >>>> >>>> not >>>> >>>>> get >>>>> >>>>>> much information from the documentation and have no prior >> experience >>>> >>>> with >>>> >>>>>> Bioperl. >>>>>> Thanks very much for your attention. >>>>>> Regards >>>>>> Nageshbi >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@portal.open-bio.org >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> hi nagesh, >>>>> try this, should work, I had the same problem: >>>>> >>>>> ....................... >>>>> ....................... >>>>> >>>>> else >>>>> { >>>>> print "RID $rid\n"; >>>>> $factory->save_output('temp.out'); >>>>> >>>>> my $checkinput = $factory->file; >>>>> open(my $fh,"<$checkinput") or die $!; >>>>> while(<$fh>){ >>>>> print; >>>>> } >>>>> close $fh; >>>>> >>>>> >>>>> $factory->remove_rid($rid); >>>>> } >>>>> } >>>>> } >>>>> >>>>> regards >>>>> Hubert >>>>> >>>>> PS: are you using the composition based statistics parameter with >> your >>>>> blast search? >>>>> if yes, is it working? >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From heikki at sanbi.ac.za Thu Jan 19 01:18:17 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu Jan 19 01:30:29 2006 Subject: [Bioperl-l] Bio::Taxonomy::{Tree&Node} testing Message-ID: <200601190818.17885.heikki@sanbi.ac.za> Dan, I've committed a preliminary test file called TaxonTree.t to bioperl main main trunk. Could you check that and correct it where needed. I got quite confused about the Node methods. What they are supposed to do and return was not quite clear. I did fix one mistake where description was stored in the same place as descendants. I hope you are still interested in working on these modules. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From nagesh.chakka at anu.edu.au Thu Jan 19 02:22:17 2006 From: nagesh.chakka at anu.edu.au (Nagesh) Date: Thu Jan 19 02:18:42 2006 Subject: [Bioperl-l] RemoteBlast.pm problem resolved!!!!! In-Reply-To: <88489B2C-0C4B-46B2-ACB6-247990E30AB6@genetics.utah.edu> References: <1137634648.5305.36.camel@vogon> <88489B2C-0C4B-46B2-ACB6-247990E30AB6@genetics.utah.edu> Message-ID: <1137655337.5305.73.camel@vogon> Hi Barry, Thanks once again for an elaborate mail and explanation. I am using the latest version of BioPerl 1.5. I also tested this problem on 1.4 with no difference. The problem is with the "$rc = $factory->retrieve_blast ($rid);" where $rc was always getting an object as a return from retrieve_blast and is never entering into sleep 5 mode (the condition "if( !ref($rc) )" is never satisfied). I thought I will have a look at the RemoteBlast.pm code once before trying anything more. I looked at the method retrieve_blast which was the main culprit and then found a possible answer for my problem. I looked at the condition which returns 0, -1 or an object which is below Code from Bio/Tools/Run/RemoteBlast.pm version 1.5 line 569-560 ######################################################### my $size = -s $tempfile; if( $size > 1000 ) { ######################################################### So I made it to print the file size and had run my perl script several times ######################################################### my $size = -s $tempfile; print "Size of temporary file from RemoteBlast.pm $size\n"; if( $size > 1000 ) { ######################################################### Each time I did so, I was getting the file size value of 2014 to 2017 and no wonder it satisfies the condition ($size > 1000) even when the results were not ready. So I modified the condition to the following ######################################################### my $size = -s $tempfile; if( $size > 2017 ) { ######################################################### and there it goes, the code behaved itself and waited until the results were ready to proceed further with saving the output. This may be a result of some changes the NCBI admin would have made to the results status page which would have increased the file size and satisfying the condition to return an object which must be returned only when the results were ready. I am not sure whether this is the right answer to the problem but it does definitely work. Any comments from people having similar problem will be useful. I will see how long does this solution would work and knock back on your doors if I need further help. Thanks for your help. Regards Nagesh On Wed, 2006-01-18 at 22:15 -0700, Barry Moore wrote: > Nagesh, > > That does sound odd. What version of bioperl are you using? I'm > guessing 1.4? If the answer is anything but 1.5 something, then I > suggest you should upgrade before going any further. You will also > want to follow the current thread by about parsing XML formatted > blast reports. I don't think this is your problem right now, but > eventually you'll have a problem if you aren't parsing XML format as > discussed in that post. I've added some more detail below if you are > having the problem with 1.5 try some debugging. > > Here's what's going on (or should be going on) in your script, and > some suggestions for using the debugger. > > #This next line hits the NCBI server, and if it gets a blast report > in return parses it, and returns a Bio::Tools::Blast object. If > there was no report you get 0, and if there was an error you get -1. > > my $rc = $factory->retrieve_blast($rid); > > print "RC $rc\n"; > > #This if statement is checking to see if the server has NOT returned > a report yet. If it did then $rc should be an object and ref $rc > will return 'Bio::SearchIIO::blast'. If $rc is not an object (i.e. > you got no report) then ref $rc returns undef. > if( !ref($rc) ) > { > #If you got here then you got no report from NCBI server yet, and so > the next if check is you got -1 meaning there was an error. On error > delete this RID cause it's no good. > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > #Print a dot on the screen in leu of music to keep the user > entertained while they wait. > print STDERR "." if ( $v > 0 ); > #Take a nap so you don't piss off NCBI sys admin! > sleep 5; > } > #Getting here means that $rc was an object, so we've got a report. > Go ahead and save it. > else > { > sleep 600; > #Obviously writing your output file. > $factory->save_output('temp.out'); > my $checkinput = $factory->file; > open(my $fh,"<$checkinput") or die $!; > while(<$fh>) > { > print; > } > close $fh; > $factory->remove_rid($rid); > > > run your script in the debugger like this: > > perl -d your_script.pl > > Step forward one line at a time by typing 'n'. > When you get just past my $rc = $factory->retrieve_blast($rid); type > 'x $rc' > You should get 0, -1 or 'Bio::SearchIO::blast' > Keep stepping forward with 'n'. > If you get 0 you should loop back to retrieve_blast after a sleep. > If you get -1 you should end your script - you got an error (What was > it?) > If you get an Bio::SearchIO::blast object then you should be writing > a temp.out > > Barry > > > On Jan 18, 2006, at 6:37 PM, Nagesh wrote: > > > Thanks very much to all specially to Barry and Hubert for their > > time in > > answering my query. Some updates into my problem. > > > > I have performed some diagnostics tests and writing below my > > observations. > > > > First of all, the problem in the code was that it was not waiting for > > the results to be ready for writing it to the output file. So I wanted > > to check whether the condition "if( !ref($rc) )" is ever satisfied > > and I > > printed out the $rc value which was some thing like "Bio::SearchIO:: > > blast=HASH(0x9010370)". When I had looked at the Bioperl documentation > > for RemoteBlast.pm, the value for $rc in "$rc = $factory- > > >retrieve_blast > > ($rid);" should either return 0 or 1. I am not able to understand > > whether what I am getting is right. > > > > Secondly, I had manually forced the script to wait between > > submit_blast, > > retrieve_blast and save_output by using sleep with values ranging from > > 30 to 600. None of them where successful in saving the output. > > > > When sleep (600) is between submit_blast and retrieve_blast, the > > following is printed onto std output (shown below is part of the > > output) > > with output file still empty. > > > >

> > > > > > > > > > > > > >

Request ID 1137626804-16566-100302560340.BLASTQ4 > b>
StatusSearching
Submitted atWed Jan 18 18:26:44 2006
Current timeWed Jan 18 18:36:46 2006
Time since submission00:10:01
> >


This page will be automatically updated in 10 seconds > > until search is done
> > > > When sleep (600) is between retrieve_blast and save_output, the > > following is printed with nothing written to output file. > > > >

> > > > > > > > > > > > > >

Request ID 1137632221-28820-85178967709.BLASTQ1 > b>
StatusSearching
Submitted atWed Jan 18 19:57:01 2006
Current timeWed Jan 18 19:57:03 2006
Time since submission00:00:01
> >


This page will be automatically updated in 10 seconds > > until search is done
> > > > Please note the difference in time since submission. > > > > Lastly, I had printed out the request ID and manually paused the > > script > > by using between submit_blast and retrieve_blast. The idea was > > to check the status of the job online through the NCBI website. > > When the > > results where ready, I made the script to proceed further and was able > > to save the desired results to the file. I am puzzled with this > > observation as I am not understanding why manually formating the > > results > > online helps in getting the results. > > I am basically a molecular biologist and trying hard to solve this > > computational stuff, so there might be some trivial issues > > according to > > you computer wiz :) > > > > Barry suggested me to use perl debugger which I will try to use. > > > > Thanks for your attention. > > > > Below is the code which was being tested. > > > > ###################################################################### > > ## > > > > use strict; > > use warnings; > > use Bio::Tools::Run::RemoteBlast; > > > > print "$Bio::Root::Version::VERSION\n"; > > my $prog = 'blastp'; > > my $db = 'swissprot'; > > my $e_val= '1e-10'; > > > > my @params = ( '-prog' => $prog, > > '-data' => $db, > > '-expect' => $e_val, > > '-readmethod' => 'SearchIO' ); > > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > #change a paramter > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > [ORGN]'; > > > > #remove a parameter > > delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > > my $v = 1; > > #$v is just to turn on and off the messages > > > > my $r = $factory->submit_blast('blastInput.txt'); > > > > print STDERR "waiting..." if( $v > 0 ); > > while ( my @rids = $factory->each_rid ) > > { > > foreach my $rid ( @rids ) > > { > > > > print "RID $rid\n"; > > > > #; > > #sleep 600; > > my $rc = $factory->retrieve_blast($rid); > > > > print "RC $rc\n"; > > if( !ref($rc) ) > > { > > if( $rc < 0 ) > > { > > $factory->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } > > else > > { > > sleep 600; > > $factory->save_output('temp.out'); > > my $checkinput = $factory->file; > > open(my $fh,"<$checkinput") or die $!; > > while(<$fh>) > > { > > print; > > } > > close $fh; > > $factory->remove_rid($rid); > > } > > } > > } > > > > ###################################################################### > > ## > > > > > > On Tue, 2006-01-17 at 16:03 -0700, Barry Moore wrote: > >> Nagesh, > >> > >> Attached is an input file, script and output. These work for me, > >> and I > >> think they are the same that you are using. Have a look and see > >> if you > >> can find any differences that might be causing you problem. Other > >> than > >> that I don't know what to tell you. If you are familiar with the > >> perl > >> debugger you (and if you're not, now's probably a good time to become > >> familiar with it) you should step through you script and be sure that > >> all of you're objects are getting defined when they are supposed > >> to be. > >> That can often help narrow down the problem. > >> > >> Barry > >> > >>> -----Original Message----- > >>> From: Nagesh Chakka [mailto:nagesh.chakka@anu.edu.au] > >>> Sent: Tuesday, January 17, 2006 1:57 PM > >>> To: Barry Moore > >>> Cc: Hubert Prielinger; bioperl-l@bioperl.org > >>> Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > >>> > >>> Bi Barry, > >>> With the help of Hubert, I further modified the script but still > >>> have > >> the > >>> same > >>> problem. The problem is that from the point of submitting the blast > >> query, > >>> the script does not wait until the blast results are ready for > >> retrieval > >>> and > >>> event of submission is immediately followed by retrieving and saving > >> the > >>> output. Since the results will not be ready (about a sec) this fast, > >> the > >>> output created is blank. I am able to retrieve the results online > >> using > >>> the > >>> RID which I am making the script to print. > >>> So my main problem is making the program to wait after > >>> submitting the > >>> result. > >>> My input file has a single fasta sequence which I have pasted below. > >>> Its interesting to note that the script works on your system. Is it > >>> creating > >>> an output file with the blast report? > >>> Thanks very much for your attention. > >>> Regards > >>> Nagesh > >>> > >>> blastInput.txt > >>>> MusDpl > >>> > >> MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAFIKQGRKLDI > >> DFG > >> AE > >>> GNRYYA > >>> > >> ANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAEFSREKQDSKLHQRVLWRLIKEICSAKHCD > >> FWL > >> ER > >>> GAAL > >>> RVAVDQPAMVCLLGFVWFIVK > >>> > >>> On Wednesday 18 January 2006 05:34, Barry Moore wrote: > >>>> Nagesh- > >>>> > >>>> Did you get this figured out? Your script works as is on my > >>>> system. > >>>> You say temp.out is empty? What does you input sequence > >>>> (blastInput.txt) look like? > >>>> > >>>> Barry > >>>> > >>>>> -----Original Message----- > >>>>> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > >>>>> bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger > >>>>> Sent: Monday, January 16, 2006 2:54 PM > >>>>> To: Nagesh Chakka; bioperl-l@portal.open-bio.org > >>>>> Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > >>>>> > >>>>> Nagesh Chakka wrote: > >>>>>> Hi All, > >>>>>> I was trying to setup a system to perform a remote blast on > >> regular > >>>>> > >>>>> basis. I > >>>>> > >>>>>> thought this could be best achieved by using BioPerl module and > >> came > >>>>> > >>>>> across > >>>>> > >>>>>> RemoteBlast.pm > >>>>>> I had modified the sample script "bp_remote_blast.pl" which takes > >> a > >>>> > >>>> file > >>>> > >>>>>> containing single FASTA sequence as an input. Also I wanted the > >> blast > >>>>> > >>>>> report > >>>>> > >>>>>> to be saved in a file for latter use and > >>>>>> modified the code as follows > >>>>>> I am using the latest version of Bioperl (1.5) on a Fedora > >> platform. > >>>>> > >>>> > >>> #################################################################### > >>> ### > >>>>> > >>>>>> print "$Bio::Root::Version::VERSION\n"; > >>>>>> use Bio::Tools::Run::RemoteBlast; > >>>>>> use strict; > >>>>>> my $prog = 'blastp'; > >>>>>> my $db = 'swissprot'; > >>>>>> my $e_val= '1e-10'; > >>>>>> > >>>>>> my @params = ( '-prog' => $prog, > >>>>>> '-data' => $db, > >>>>>> '-expect' => $e_val, > >>>>>> '-readmethod' => 'SearchIO' ); > >>>>>> > >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>>>> > >>>>>> #change a paramter > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo > >> sapiens > >>>>>> [ORGN]'; > >>>>>> > >>>>>> #remove a parameter > >>>>>> delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > >>>>>> > >>>>>> my $v = 1; > >>>>>> #$v is just to turn on and off the messages > >>>>>> > >>>>>> my $r = $factory->submit_blast('blastInput.txt'); > >>>>>> > >>>>>> print STDERR "waiting..." if( $v > 0 ); > >>>>>> while ( my @rids = $factory->each_rid ) > >>>>>> { > >>>>>> foreach my $rid ( @rids ) > >>>>>> { > >>>>>> my $rc = $factory->retrieve_blast($rid); > >>>>>> if( !ref($rc) ) > >>>>>> { > >>>>>> if( $rc < 0 ) > >>>>>> { > >>>>>> $factory->remove_rid($rid); > >>>>>> } > >>>>>> print STDERR "." if ( $v > 0 ); > >>>>>> sleep 5; > >>>>>> } > >>>>>> else > >>>>>> { > >>>>>> print "RID $rid\n"; > >>>>>> $factory->save_output('temp.out'); > >>>>>> $factory->remove_rid($rid); > >>>>>> } > >>>>>> } > >>>>>> } > >>>>> > >>>> > >>> #################################################################### > >>> ### > >>>> > >>>> ## > >>>> > >>>>> ######## > >>>>> > >>>>>> This script prints the RID and terminates immediately. Obviously > >> the > >>>>>> output file created is empty as the program did not wait for > >> getting > >>>> > >>>> the > >>>> > >>>>>> blast results from the RID. > >>>>>> Is there something I am doing wrong and what can I do for the > >> program > >>>> > >>>> to > >>>> > >>>>> wait > >>>>> > >>>>>> until the results are ready to be printed to the output file. I > >> could > >>>> > >>>> not > >>>> > >>>>> get > >>>>> > >>>>>> much information from the documentation and have no prior > >> experience > >>>> > >>>> with > >>>> > >>>>>> Bioperl. > >>>>>> Thanks very much for your attention. > >>>>>> Regards > >>>>>> Nageshbi > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l@portal.open-bio.org > >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> hi nagesh, > >>>>> try this, should work, I had the same problem: > >>>>> > >>>>> ....................... > >>>>> ....................... > >>>>> > >>>>> else > >>>>> { > >>>>> print "RID $rid\n"; > >>>>> $factory->save_output('temp.out'); > >>>>> > >>>>> my $checkinput = $factory->file; > >>>>> open(my $fh,"<$checkinput") or die $!; > >>>>> while(<$fh>){ > >>>>> print; > >>>>> } > >>>>> close $fh; > >>>>> > >>>>> > >>>>> $factory->remove_rid($rid); > >>>>> } > >>>>> } > >>>>> } > >>>>> > >>>>> regards > >>>>> Hubert > >>>>> > >>>>> PS: are you using the composition based statistics parameter with > >> your > >>>>> blast search? > >>>>> if yes, is it working? > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l@portal.open-bio.org > >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From ajo11 at mole.bio.cam.ac.uk Thu Jan 19 07:02:47 2006 From: ajo11 at mole.bio.cam.ac.uk (Amanda O'Reilly) Date: Thu Jan 19 07:14:41 2006 Subject: [Bioperl-l] SVG treefile problem Message-ID: <43CF7FE7.2080309@mole.bio.cam.ac.uk> I am trying to draw SVG format phylogenetic trees - the output tree is distorted. Using checked code & tree from here (code reproduced below also): http://portal.open-bio.org/pipermail/bioperl-l/2004-April/015581.html This gives a distorted tree if I leave out 'warn $tree'. If I leave 'warn $tree' in the code, I get the following error. Bio::Tree::Tree=HASH(0x606b68) at tree_play.pl line 17, line 1. Have tried running with UNIX (BioPerl 1.5) & Linux installations & tried viewing tree with different applications- output tree always looks wrong. Thanks, Amanda. #!/usr/local/bin/perl -w use strict; use lib '.'; use Bio::TreeIO; use Data::Dumper; use SVG::Graph; my $infile = "/scratch/ajo11/exp/aln/000ms/11.ph"; my $outfile = ">/scratch/ajo11/exp/aln/000ms/11.svg"; my $in = new Bio::TreeIO(-file => $infile, -format => 'newick'); my $out = new Bio::TreeIO(-file => $outfile, -format => 'svggraph'); while( my $tree = $in->next_tree ) { #warn $tree; my $svg_xml = $out->write_tree($tree); } From supramuk at yahoo.com Thu Jan 19 00:47:57 2006 From: supramuk at yahoo.com (supratim mukherjee) Date: Thu Jan 19 08:40:12 2006 Subject: [Bioperl-l] Re: volunteers needed Message-ID: <20060119054758.91641.qmail@web32406.mail.mud.yahoo.com> Respected Sir/Madam, I am a student from Bangalore University, India and have just finished my masters in Biotechnology. I have applied for PhD in a few universities in USA and am awaiting the admission decision. I would like to contribute to bioperl.org. Please let me know if any of my contributions would help. Regards Supratim __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From cjfields at uiuc.edu Thu Jan 19 11:26:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Jan 19 11:23:02 2006 Subject: [Bioperl-l] RemoteBlast.pm problem resolved!!!!! In-Reply-To: <1137655337.5305.73.camel@vogon> Message-ID: <001801c61d15$10fd9210$15327e82@pyrimidine> This resolves the problem only if you use bioperl 1.5.1. RemoteBlast.pm was changed ~fall 2005 and removed the $size variable (as reported here: http://bugzilla.bioperl.org/show_bug.cgi?id=1864). The text output will save if you use Search::IO. However, parsing text output seems to be broken using SearchIO at the moment, likely due to modifications in output that probably broke SearchIO::blast. Jason addresses this in the last few emails in this thread. If you plan on parsing out data (like accessions or HSP's) from BLAST output, then you may have to switch to XML as text or HTML parsing can break at any time. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > bounces@portal.open-bio.org] On Behalf Of Nagesh > Sent: Thursday, January 19, 2006 1:22 AM > To: Barry Moore; bioperl-l@bioperl.org > Cc: ganesh.b.chakka@jpmorgan.com > Subject: Re: [Bioperl-l] RemoteBlast.pm problem resolved!!!!! > > Hi Barry, > Thanks once again for an elaborate mail and explanation. I am using the > latest version of BioPerl 1.5. I also tested this problem on 1.4 with no > difference. The problem is with the "$rc = $factory->retrieve_blast > ($rid);" where $rc was always getting an object as a return from > retrieve_blast and is never entering into sleep 5 mode (the condition > "if( !ref($rc) )" is never satisfied). > > I thought I will have a look at the RemoteBlast.pm code once before > trying anything more. I looked at the method retrieve_blast which was > the main culprit and then found a possible answer for my problem. I > looked at the condition which returns 0, -1 or an object which is below > > Code from Bio/Tools/Run/RemoteBlast.pm version 1.5 line 569-560 > ######################################################### > my $size = -s $tempfile; > if( $size > 1000 ) { > ######################################################### > > So I made it to print the file size and had run my perl script several > times > > ######################################################### > my $size = -s $tempfile; > print "Size of temporary file from RemoteBlast.pm $size\n"; > if( $size > 1000 ) { > ######################################################### > > Each time I did so, I was getting the file size value of 2014 to 2017 > and no wonder it satisfies the condition ($size > 1000) even when the > results were not ready. > > So I modified the condition to the following > ######################################################### > my $size = -s $tempfile; > if( $size > 2017 ) { > ######################################################### > > and there it goes, the code behaved itself and waited until the results > were ready to proceed further with saving the output. > This may be a result of some changes the NCBI admin would have made to > the results status page which would have increased the file size and > satisfying the condition to return an object which must be returned only > when the results were ready. > I am not sure whether this is the right answer to the problem but it > does definitely work. > Any comments from people having similar problem will be useful. I will > see how long does this solution would work and knock back on your doors > if I need further help. > Thanks for your help. > Regards > Nagesh > > > On Wed, 2006-01-18 at 22:15 -0700, Barry Moore wrote: > > Nagesh, > > > > That does sound odd. What version of bioperl are you using? I'm > > guessing 1.4? If the answer is anything but 1.5 something, then I > > suggest you should upgrade before going any further. You will also > > want to follow the current thread by about parsing XML formatted > > blast reports. I don't think this is your problem right now, but > > eventually you'll have a problem if you aren't parsing XML format as > > discussed in that post. I've added some more detail below if you are > > having the problem with 1.5 try some debugging. > > > > Here's what's going on (or should be going on) in your script, and > > some suggestions for using the debugger. > > > > #This next line hits the NCBI server, and if it gets a blast report > > in return parses it, and returns a Bio::Tools::Blast object. If > > there was no report you get 0, and if there was an error you get -1. > > > > my $rc = $factory->retrieve_blast($rid); > > > > print "RC $rc\n"; > > > > #This if statement is checking to see if the server has NOT returned > > a report yet. If it did then $rc should be an object and ref $rc > > will return 'Bio::SearchIIO::blast'. If $rc is not an object (i.e. > > you got no report) then ref $rc returns undef. > > if( !ref($rc) ) > > { > > #If you got here then you got no report from NCBI server yet, and so > > the next if check is you got -1 meaning there was an error. On error > > delete this RID cause it's no good. > > if( $rc < 0 ) > > { > > $factory->remove_rid($rid); > > } > > #Print a dot on the screen in leu of music to keep the user > > entertained while they wait. > > print STDERR "." if ( $v > 0 ); > > #Take a nap so you don't piss off NCBI sys admin! > > sleep 5; > > } > > #Getting here means that $rc was an object, so we've got a report. > > Go ahead and save it. > > else > > { > > sleep 600; > > #Obviously writing your output file. > > $factory->save_output('temp.out'); > > my $checkinput = $factory->file; > > open(my $fh,"<$checkinput") or die $!; > > while(<$fh>) > > { > > print; > > } > > close $fh; > > $factory->remove_rid($rid); > > > > > > run your script in the debugger like this: > > > > perl -d your_script.pl > > > > Step forward one line at a time by typing 'n'. > > When you get just past my $rc = $factory->retrieve_blast($rid); type > > 'x $rc' > > You should get 0, -1 or 'Bio::SearchIO::blast' > > Keep stepping forward with 'n'. > > If you get 0 you should loop back to retrieve_blast after a sleep. > > If you get -1 you should end your script - you got an error (What was > > it?) > > If you get an Bio::SearchIO::blast object then you should be writing > > a temp.out > > > > Barry > > > > > > On Jan 18, 2006, at 6:37 PM, Nagesh wrote: > > > > > Thanks very much to all specially to Barry and Hubert for their > > > time in > > > answering my query. Some updates into my problem. > > > > > > I have performed some diagnostics tests and writing below my > > > observations. > > > > > > First of all, the problem in the code was that it was not waiting for > > > the results to be ready for writing it to the output file. So I wanted > > > to check whether the condition "if( !ref($rc) )" is ever satisfied > > > and I > > > printed out the $rc value which was some thing like "Bio::SearchIO:: > > > blast=HASH(0x9010370)". When I had looked at the Bioperl documentation > > > for RemoteBlast.pm, the value for $rc in "$rc = $factory- > > > >retrieve_blast > > > ($rid);" should either return 0 or 1. I am not able to understand > > > whether what I am getting is right. > > > > > > Secondly, I had manually forced the script to wait between > > > submit_blast, > > > retrieve_blast and save_output by using sleep with values ranging from > > > 30 to 600. None of them where successful in saving the output. > > > > > > When sleep (600) is between submit_blast and retrieve_blast, the > > > following is printed onto std output (shown below is part of the > > > output) > > > with output file still empty. > > > > > >

> > > > > > > > > > > > > > > > > > > > >

Request ID 1137626804-16566-100302560340.BLASTQ4 > > b>
StatusSearching
Submitted atWed Jan 18 18:26:44 2006
Current timeWed Jan 18 18:36:46 2006
Time since submission00:10:01
> > >


This page will be automatically updated in 10 seconds > > > until search is done
> > > > > > When sleep (600) is between retrieve_blast and save_output, the > > > following is printed with nothing written to output file. > > > > > >

> > > > > > > > > > > > > > > > > > > > >

Request ID 1137632221-28820-85178967709.BLASTQ1 > > b>
StatusSearching
Submitted atWed Jan 18 19:57:01 2006
Current timeWed Jan 18 19:57:03 2006
Time since submission00:00:01
> > >


This page will be automatically updated in 10 seconds > > > until search is done
> > > > > > Please note the difference in time since submission. > > > > > > Lastly, I had printed out the request ID and manually paused the > > > script > > > by using between submit_blast and retrieve_blast. The idea was > > > to check the status of the job online through the NCBI website. > > > When the > > > results where ready, I made the script to proceed further and was able > > > to save the desired results to the file. I am puzzled with this > > > observation as I am not understanding why manually formating the > > > results > > > online helps in getting the results. > > > I am basically a molecular biologist and trying hard to solve this > > > computational stuff, so there might be some trivial issues > > > according to > > > you computer wiz :) > > > > > > Barry suggested me to use perl debugger which I will try to use. > > > > > > Thanks for your attention. > > > > > > Below is the code which was being tested. > > > > > > ###################################################################### > > > ## > > > > > > use strict; > > > use warnings; > > > use Bio::Tools::Run::RemoteBlast; > > > > > > print "$Bio::Root::Version::VERSION\n"; > > > my $prog = 'blastp'; > > > my $db = 'swissprot'; > > > my $e_val= '1e-10'; > > > > > > my @params = ( '-prog' => $prog, > > > '-data' => $db, > > > '-expect' => $e_val, > > > '-readmethod' => 'SearchIO' ); > > > > > > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > #change a paramter > > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo sapiens > > > [ORGN]'; > > > > > > #remove a parameter > > > delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > > > > > my $v = 1; > > > #$v is just to turn on and off the messages > > > > > > my $r = $factory->submit_blast('blastInput.txt'); > > > > > > print STDERR "waiting..." if( $v > 0 ); > > > while ( my @rids = $factory->each_rid ) > > > { > > > foreach my $rid ( @rids ) > > > { > > > > > > print "RID $rid\n"; > > > > > > #; > > > #sleep 600; > > > my $rc = $factory->retrieve_blast($rid); > > > > > > print "RC $rc\n"; > > > if( !ref($rc) ) > > > { > > > if( $rc < 0 ) > > > { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } > > > else > > > { > > > sleep 600; > > > $factory->save_output('temp.out'); > > > my $checkinput = $factory->file; > > > open(my $fh,"<$checkinput") or die $!; > > > while(<$fh>) > > > { > > > print; > > > } > > > close $fh; > > > $factory->remove_rid($rid); > > > } > > > } > > > } > > > > > > ###################################################################### > > > ## > > > > > > > > > On Tue, 2006-01-17 at 16:03 -0700, Barry Moore wrote: > > >> Nagesh, > > >> > > >> Attached is an input file, script and output. These work for me, > > >> and I > > >> think they are the same that you are using. Have a look and see > > >> if you > > >> can find any differences that might be causing you problem. Other > > >> than > > >> that I don't know what to tell you. If you are familiar with the > > >> perl > > >> debugger you (and if you're not, now's probably a good time to become > > >> familiar with it) you should step through you script and be sure that > > >> all of you're objects are getting defined when they are supposed > > >> to be. > > >> That can often help narrow down the problem. > > >> > > >> Barry > > >> > > >>> -----Original Message----- > > >>> From: Nagesh Chakka [mailto:nagesh.chakka@anu.edu.au] > > >>> Sent: Tuesday, January 17, 2006 1:57 PM > > >>> To: Barry Moore > > >>> Cc: Hubert Prielinger; bioperl-l@bioperl.org > > >>> Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > >>> > > >>> Bi Barry, > > >>> With the help of Hubert, I further modified the script but still > > >>> have > > >> the > > >>> same > > >>> problem. The problem is that from the point of submitting the blast > > >> query, > > >>> the script does not wait until the blast results are ready for > > >> retrieval > > >>> and > > >>> event of submission is immediately followed by retrieving and saving > > >> the > > >>> output. Since the results will not be ready (about a sec) this fast, > > >> the > > >>> output created is blank. I am able to retrieve the results online > > >> using > > >>> the > > >>> RID which I am making the script to print. > > >>> So my main problem is making the program to wait after > > >>> submitting the > > >>> result. > > >>> My input file has a single fasta sequence which I have pasted below. > > >>> Its interesting to note that the script works on your system. Is it > > >>> creating > > >>> an output file with the blast report? > > >>> Thanks very much for your attention. > > >>> Regards > > >>> Nagesh > > >>> > > >>> blastInput.txt > > >>>> MusDpl > > >>> > > >> MKNRLGTWWVAILCMLLASHLSTVKARGIKHRFKWNRKVLPSSGGQITEARVAENRPGAFIKQGRKLDI > > >> DFG > > >> AE > > >>> GNRYYA > > >>> > > >> ANYWQFPDGIYYEGCSEANVTKEMLVTSCVNATQAANQAEFSREKQDSKLHQRVLWRLIKEICSAKHCD > > >> FWL > > >> ER > > >>> GAAL > > >>> RVAVDQPAMVCLLGFVWFIVK > > >>> > > >>> On Wednesday 18 January 2006 05:34, Barry Moore wrote: > > >>>> Nagesh- > > >>>> > > >>>> Did you get this figured out? Your script works as is on my > > >>>> system. > > >>>> You say temp.out is empty? What does you input sequence > > >>>> (blastInput.txt) look like? > > >>>> > > >>>> Barry > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l- > > >>>>> bounces@portal.open-bio.org] On Behalf Of Hubert Prielinger > > >>>>> Sent: Monday, January 16, 2006 2:54 PM > > >>>>> To: Nagesh Chakka; bioperl-l@portal.open-bio.org > > >>>>> Subject: Re: [Bioperl-l] Trouble using RemoteBlast.pm > > >>>>> > > >>>>> Nagesh Chakka wrote: > > >>>>>> Hi All, > > >>>>>> I was trying to setup a system to perform a remote blast on > > >> regular > > >>>>> > > >>>>> basis. I > > >>>>> > > >>>>>> thought this could be best achieved by using BioPerl module and > > >> came > > >>>>> > > >>>>> across > > >>>>> > > >>>>>> RemoteBlast.pm > > >>>>>> I had modified the sample script "bp_remote_blast.pl" which takes > > >> a > > >>>> > > >>>> file > > >>>> > > >>>>>> containing single FASTA sequence as an input. Also I wanted the > > >> blast > > >>>>> > > >>>>> report > > >>>>> > > >>>>>> to be saved in a file for latter use and > > >>>>>> modified the code as follows > > >>>>>> I am using the latest version of Bioperl (1.5) on a Fedora > > >> platform. > > >>>>> > > >>>> > > >>> #################################################################### > > >>> ### > > >>>>> > > >>>>>> print "$Bio::Root::Version::VERSION\n"; > > >>>>>> use Bio::Tools::Run::RemoteBlast; > > >>>>>> use strict; > > >>>>>> my $prog = 'blastp'; > > >>>>>> my $db = 'swissprot'; > > >>>>>> my $e_val= '1e-10'; > > >>>>>> > > >>>>>> my @params = ( '-prog' => $prog, > > >>>>>> '-data' => $db, > > >>>>>> '-expect' => $e_val, > > >>>>>> '-readmethod' => 'SearchIO' ); > > >>>>>> > > >>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > >>>>>> > > >>>>>> #change a paramter > > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Homo > > >> sapiens > > >>>>>> [ORGN]'; > > >>>>>> > > >>>>>> #remove a parameter > > >>>>>> delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; > > >>>>>> > > >>>>>> my $v = 1; > > >>>>>> #$v is just to turn on and off the messages > > >>>>>> > > >>>>>> my $r = $factory->submit_blast('blastInput.txt'); > > >>>>>> > > >>>>>> print STDERR "waiting..." if( $v > 0 ); > > >>>>>> while ( my @rids = $factory->each_rid ) > > >>>>>> { > > >>>>>> foreach my $rid ( @rids ) > > >>>>>> { > > >>>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>>> if( !ref($rc) ) > > >>>>>> { > > >>>>>> if( $rc < 0 ) > > >>>>>> { > > >>>>>> $factory->remove_rid($rid); > > >>>>>> } > > >>>>>> print STDERR "." if ( $v > 0 ); > > >>>>>> sleep 5; > > >>>>>> } > > >>>>>> else > > >>>>>> { > > >>>>>> print "RID $rid\n"; > > >>>>>> $factory->save_output('temp.out'); > > >>>>>> $factory->remove_rid($rid); > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>> > > >>>> > > >>> #################################################################### > > >>> ### > > >>>> > > >>>> ## > > >>>> > > >>>>> ######## > > >>>>> > > >>>>>> This script prints the RID and terminates immediately. Obviously > > >> the > > >>>>>> output file created is empty as the program did not wait for > > >> getting > > >>>> > > >>>> the > > >>>> > > >>>>>> blast results from the RID. > > >>>>>> Is there something I am doing wrong and what can I do for the > > >> program > > >>>> > > >>>> to > > >>>> > > >>>>> wait > > >>>>> > > >>>>>> until the results are ready to be printed to the output file. I > > >> could > > >>>> > > >>>> not > > >>>> > > >>>>> get > > >>>>> > > >>>>>> much information from the documentation and have no prior > > >> experience > > >>>> > > >>>> with > > >>>> > > >>>>>> Bioperl. > > >>>>>> Thanks very much for your attention. > > >>>>>> Regards > > >>>>>> Nageshbi > > >>>>>> _______________________________________________ > > >>>>>> Bioperl-l mailing list > > >>>>>> Bioperl-l@portal.open-bio.org > > >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > >>>>> > > >>>>> hi nagesh, > > >>>>> try this, should work, I had the same problem: > > >>>>> > > >>>>> ....................... > > >>>>> ....................... > > >>>>> > > >>>>> else > > >>>>> { > > >>>>> print "RID $rid\n"; > > >>>>> $factory->save_output('temp.out'); > > >>>>> > > >>>>> my $checkinput = $factory->file; > > >>>>> open(my $fh,"<$checkinput") or die $!; > > >>>>> while(<$fh>){ > > >>>>> print; > > >>>>> } > > >>>>> close $fh; > > >>>>> > > >>>>> > > >>>>> $factory->remove_rid($rid); > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> > > >>>>> regards > > >>>>> Hubert > > >>>>> > > >>>>> PS: are you using the composition based statistics parameter with > > >> your > > >>>>> blast search? > > >>>>> if yes, is it working? > > >>>>> > > >>>>> _______________________________________________ > > >>>>> Bioperl-l mailing list > > >>>>> Bioperl-l@portal.open-bio.org > > >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From akarger at CGR.Harvard.edu Thu Jan 19 12:15:41 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Thu Jan 19 12:26:46 2006 Subject: [Bioperl-l] Calculating a bunch of SNPs Message-ID: <339D68B133EAD311971E009027DC479703DB438B@montecarlo.cgr.harvard.edu> I have 96 files. The first is a reference sequence. The other 95 are sequences from different genotypes, with minor SNPs compared to the first one. I want to generate a list of all the SNPs for each sequence compared to the reference sequence. Output format doesn't really matter. I was told I could run EMBOSS diffseq on each of the 95 pairs, and parse the output to get my list. I'm wondering if there's a Bioperl tool that will do what diffseq does, though - presumably outputting Bio::Align objects of some kind, or is it Bio::Variation? - rather than parsing 95*N output files. Thanks, - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 From cjfields at uiuc.edu Thu Jan 19 12:53:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Jan 19 12:49:48 2006 Subject: [Bioperl-l] XML output from RemoteBlast In-Reply-To: Message-ID: <001f01c61d21$31086d80$15327e82@pyrimidine> Jason, Nope. No go. I thought Nagesh may have found the problem with the $size parameter (maybe the XML-formatted output was > 1000), but there is no $size variable now. RemoteBlast.pm was changed ~fall 2005 (by you, I believe) to fix bug 1864 (http://bugzilla.bioperl.org/show_bug.cgi?id=1864), so is post-1.5.1. I'm using a recent PPM build of bioperl-live. As reported before, it worked up until very recently (within the last week), but I was parsing text output and using '-readmethod'=>'SearchIO' or 'blast' in the parameters list. My script uses a local sequence file (FASTA) in a BLASTP search against 'nr'. When FORMAT_TYPE was set to 'Text' format using SearchIO for readmethod, everything works fine and I get saved output; switching to 'readmethod'=>'xml' and FORMAT_TYPE to XML, gives a blank file. The -verbose switch is on, so I can switch FORMAT_TYPE to any of the accepted parameter settings (HTML, Text, ASN.1, XML) and I see the corresponding output style sent to stdout along with the warnings from the NCBI queue. However, nothing besides text output will save, suggesting something with retrieve_blast() in RemoteBlast.pm. Strangely, the file name, derived from query_name, does not pick up the query name sent, but a chunk of the RID! BTW, it only does this with XML output; the query_name from text output is as expected. Changing $filename to temp.blastp (commented out below) doesn't do the trick; it's still an empty file. I have also tried an older version of this script on Mac OS X and had similar problems with XML output, but text output saves fine, so I don't think this is the OS. Here's the saved file names (using XML output) and their RID's (no point in sending the file contents, they were all blank). These were all using the same query sequence; I noticed that the file names were different each time and thought of the RID. 1_20910.blastp ^^^^^ 1137691949-20910-102543092805.BLASTQ4 ^^^^^ 1_25245.blastp ^^^^^ 1137692051-25245-128580015999.BLASTQ1 ^^^^^ 1_21057.blastp ^^^^^ 1137692263-21057-148127371984.BLASTQ4 ^^^^^ Is the RID jamming up the works somehow? Following is the script (sorry if it's a bit clunky) ____________________________________________________________________________ ___ #!perl use strict; use Bio::Tools::Run::RemoteBlast; # $v is just to turn on and off the messages my $v = 1; # changing or modifying parameters for blast search my $prog = 'blastp'; my $db = 'nr'; my $e_val = '0.1'; my @params = ( '-verbose' => $v, '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-readmethod' => 'xml' ); # remove filter delete $Bio::Tools::Run::RemoteBlast::HEADER{'FILTER'}; # change cgi parameters for blast results # DESCRIPTIONS and ALIGNMENTS need to be changed in both the HEADER # and RETRIEVALHEADER hashes $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'FORMAT_TYPE'} = 'XML'; # init new BLAST factory my $factory = Bio::Tools::Run::RemoteBlast->new(@params); print "Starting blast search ...\n"; # submit blast query my $r = $factory->submit_blast('m_smeg_pyrR.txt'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); # if RID is not present if( !ref($rc) ) { # remove if RID is bad (error) if( $rc < 0 ) { $factory->remove_rid($rid); } # otherwise, query is still in progress, continue loop, printing output # if requested print STDERR "." if ( $v > 0 ); sleep 2; } else { # RID is returned # save the output print $rid; my $result = $rc->next_result(); my $filename= $result->query_name.".blastp"; #my $filename= "temp.blastp"; $factory->save_output($filename); # remove RID from list $factory->remove_rid($rid); } } } ____________________________________________________________________________ ___ I may switch to the blast client from NCBI for now, but I would like to keep RemoteBlast.pm going somehow unless it's completely unfeasible. I'm a still a bit green when it comes to object-oriented programming (I am primarily a molecular biologist with programming experience) and I'm still trying to wrap my head around some bioperl objects and their methods (though I'm catching on slowly). Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Wednesday, January 18, 2006 10:23 PM > To: Chris Fields > Subject: Re: [Bioperl-l] XML output from RemoteBlast > > This doesn't work for you? > http://bioperl.open-bio.org/news/2005/11/06/getting-blastxml-using- > remoteblast/ > On Jan 18, 2006, at 11:04 PM, Chris Fields wrote: > > > Is there any known way to save XML-formatted BLAST queries from > > RemoteBlast? Changing the FORMAT_TYPE in the retrieval header to > > anything other than 'Text' gives a blank output file. > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From osborne1 at optonline.net Thu Jan 19 12:57:17 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu Jan 19 12:59:54 2006 Subject: [Bioperl-l] Re: volunteers needed In-Reply-To: <20060119054758.91641.qmail@web32406.mail.mud.yahoo.com> Message-ID: Supratim, In a week or so Jason Stajich will be releasing a new Bioperl documentation site, it will have a nice page detailing a number of different projects that need volunteers. While you wait think about what areas in bioinformatics _you_ want to work on, it's probably the case that you'll do the best work on those topics that interest you personally. Brian O. On 1/19/06 12:47 AM, "supratim mukherjee" wrote: > Respected Sir/Madam, > > I am a student from Bangalore University, India and > have just finished my masters in Biotechnology. I have > applied for PhD in a few universities in USA and am > awaiting the admission decision. > > I would like to contribute to bioperl.org. Please let > me know if any of my contributions would help. > > Regards > Supratim > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From andyn108 at gmail.com Thu Jan 19 13:28:36 2006 From: andyn108 at gmail.com (Andy Nunberg) Date: Thu Jan 19 13:51:50 2006 Subject: [Bioperl-l] problem with Primer3: too many files open Message-ID: <2ae1a0fe0601191028i718efe15yc76185b5a64cda99@mail.gmail.com> Hi, I am using bioperl-1.4 running Primer3 to select a bunch of primers. While running the script, I get an exception at the same point with the following error: ------------- EXCEPTION ------------- MSG: Can't open RESULTS:Too many open files STACK Bio::Tools::Run::Primer3::run /compbio/pkg/bio-perl/bioperl-run/Bio/Tools/Run/Primer3.pm:361 STACK (eval) find_primers_first_pass.pl:183 STACK main::_primer3 find_primers_first_pass.pl:182 STACK main::get_primer find_primers_first_pass.pl:141 STACK toplevel find_primers_first_pass.pl:86 -------------------------------------- Now if I take this sequence out of the list and run the script, it runs just fine. here is the subroutine calling primer3: sub _primer3{ my($seq,$qual_region)=@_; my $primer3=Bio::Tools::Run::Primer3->new(-seq=>$seq,-verbose=>0,-flush=>1); my @qual = @{$seq->qual}; #set the start of the search window for primer3 my $primer3_start=1; if($seq->length > ($window+100)){ $primer3_start=$qual_region->end-($window+100); } #set up primer3 $primer3->add_targets('INCLUDED_REGION'=>"$primer3_start,$window"); $primer3->add_targets('PRIMER_FIRST_BASE_INDEX'=>1, 'PRIMER_TASK'=>'pick_left_only'); $primer3->add_targets('PRIMER_SEQUENCE_QUALITY'=>"@qual"); $primer3->add_targets('PRIMER_MIN_QUALITY'=>$minqual, 'PRIMER_NUM_RETURN'=>1, 'PRIMER_MAX_POLY_X'=>3); $primer3->add_targets('PRIMER_GC_CLAMP'=>1) unless($no_gc_clamp); #run primer3 my $prim3_results; eval { $prim3_results=$primer3->run; }; die $seq->id." :$@" if ($@); #fetch result for the first primer my $hash_ref=$prim3_results->primer_results(0); return $hash_ref; } any suggestions? any thoughts on why I am getting the error to begin with? thanks From avilella at ub.edu Thu Jan 19 13:31:01 2006 From: avilella at ub.edu (Albert Vilella) Date: Thu Jan 19 13:54:52 2006 Subject: [Bioperl-l] Calculating a bunch of SNPs In-Reply-To: <339D68B133EAD311971E009027DC479703DB438B@montecarlo.cgr.harvard.edu> References: <339D68B133EAD311971E009027DC479703DB438B@montecarlo.cgr.harvard.edu> Message-ID: <1137695462.9170.6.camel@localhost.localdomain> El dj 19 de 01 del 2006 a les 12:15 -0500, en/na Amir Karger va escriure: > I have 96 files. The first is a reference sequence. The other 95 are > sequences from different genotypes, with minor SNPs compared to the first > one. I want to generate a list of all the SNPs for each sequence compared to > the reference sequence. Output format doesn't really matter. Dear Amir, If the sequences are simply instances of genotypes/haplotypes, so that each position already correlates in all 96 sequences, then one possibility would be to simply create a Bio::Align object by adding each of them. Once you have your alignment, you can get the marker information with the aln_to_population method of Bio::PopGen::Utilities. Usage : my $pop = Bio::PopGen::Utilities->aln_to_population($aln); Function: Turn and alignment into a set of L objects grouped in a L object You will see some example output files in t/data/. There may be other (better or different) ways to do what you need with Bioperl, Albert. > I was told I could run EMBOSS diffseq on each of the 95 pairs, and parse the > output to get my list. I'm wondering if there's a Bioperl tool that will do > what diffseq does, though - presumably outputting Bio::Align objects of some > kind, or is it Bio::Variation? - rather than parsing 95*N output files. From cjfields at uiuc.edu Thu Jan 19 15:55:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu Jan 19 15:52:37 2006 Subject: [Bioperl-l] XML output from RemoteBlast In-Reply-To: Message-ID: <000101c61d3a$aae30070$15327e82@pyrimidine> ...and I tried an XML-formatted BLASTP file (from blastcl3 output) to test SearchIO directly; it's not SearchIO or blastxml. They parsed accessions, hits, etc very well. So at least I can use a system call to blastcl3 with parameters as a workaround for now. I'm pretty sure it is the retrieve_blast() or save_output() method in RemoteBlast.pm. I'm busy trying to finish up a write-up for bioperl-db (among the experiments going on in the lab), but I'll try to figure it out. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Wednesday, January 18, 2006 10:23 PM > To: Chris Fields > Subject: Re: [Bioperl-l] XML output from RemoteBlast > > This doesn't work for you? > http://bioperl.open-bio.org/news/2005/11/06/getting-blastxml-using- > remoteblast/ > On Jan 18, 2006, at 11:04 PM, Chris Fields wrote: > > > Is there any known way to save XML-formatted BLAST queries from > > RemoteBlast? Changing the FORMAT_TYPE in the retrieval header to > > anything other than 'Text' gives a blank output file. > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From osborne1 at optonline.net Thu Jan 19 16:06:29 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu Jan 19 16:08:05 2006 Subject: [Bioperl-l] problem with Primer3: too many files open In-Reply-To: <2ae1a0fe0601191028i718efe15yc76185b5a64cda99@mail.gmail.com> Message-ID: Andy, I believe this is fixed in 1.5.1. The newer version should look like this around line 370 (bioperl-run/Bio/Tools/Run/Primer3.pm): my ($temphandle, $tempfile)=$self->io->tempfile; print $temphandle join "\n", @{$self->{'primer3_input'}}, "=\n"; $temphandle->close; open (RESULTS, "$executable < $tempfile|") || $self->throw("Can't open RESULTS"); Do you see the line with close in your file? If not either add this line or upgrade to 1.5.1. Brian O. On 1/19/06 1:28 PM, "Andy Nunberg" wrote: > Hi, I am using bioperl-1.4 running Primer3 to select a bunch of primers. > While running the script, I get an exception at the same point with > the following error: > > ------------- EXCEPTION ------------- > MSG: Can't open RESULTS:Too many open files > STACK Bio::Tools::Run::Primer3::run > /compbio/pkg/bio-perl/bioperl-run/Bio/Tools/Run/Primer3.pm:361 > STACK (eval) find_primers_first_pass.pl:183 > STACK main::_primer3 find_primers_first_pass.pl:182 > STACK main::get_primer find_primers_first_pass.pl:141 > STACK toplevel find_primers_first_pass.pl:86 > > -------------------------------------- > > Now if I take this sequence out of the list and run the script, it > runs just fine. > > here is the subroutine calling primer3: > sub _primer3{ > my($seq,$qual_region)=@_; > my > $primer3=Bio::Tools::Run::Primer3->new(-seq=>$seq,-verbose=>0,-flush=>1); > my @qual = @{$seq->qual}; > #set the start of the search window for primer3 > my $primer3_start=1; > if($seq->length > ($window+100)){ > $primer3_start=$qual_region->end-($window+100); > } > #set up primer3 > $primer3->add_targets('INCLUDED_REGION'=>"$primer3_start,$window"); > $primer3->add_targets('PRIMER_FIRST_BASE_INDEX'=>1, > 'PRIMER_TASK'=>'pick_left_only'); > $primer3->add_targets('PRIMER_SEQUENCE_QUALITY'=>"@qual"); > $primer3->add_targets('PRIMER_MIN_QUALITY'=>$minqual, > 'PRIMER_NUM_RETURN'=>1, > 'PRIMER_MAX_POLY_X'=>3); > $primer3->add_targets('PRIMER_GC_CLAMP'=>1) unless($no_gc_clamp); > > #run primer3 > my $prim3_results; > eval { > $prim3_results=$primer3->run; > }; > die $seq->id." :$@" if ($@); > > #fetch result for the first primer > my $hash_ref=$prim3_results->primer_results(0); > return $hash_ref; > > } > > any suggestions? any thoughts on why I am getting the error to begin with? > thanks > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Thu Jan 19 18:11:22 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jan 19 18:14:33 2006 Subject: [Bioperl-l] search2gff In-Reply-To: References: Message-ID: I added a couple of capabilities to the scripts/utilities/search2gff script written by Jason. In a nutshell, there are now options for controlling the score, location, and method of the HSP-representing feature, as well as options for printing of parent, which parent, and whether to skip all except the first HSP for each hit. As for possible applications, for example using these options you can blast SNP assay primers and use the options to create SNP features for a single basepair at the end of the primer, ready to be piped to a GBrowse GFF3 loader. I tried to preserve the original functionality in its entirety, i.e., if you don't use any of the new options the script should work as before. If not please let me know. POD is attached. -hilmar -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- SYNOPSIS Usage: search2gff [-o outputfile] [-f reportformat] [-i inputfilename] OR file1 file2 .. DESCRIPTION This script will turn a protein Search report (BLASTP, FASTP, SSEARCH, AXT, WABA) into a GFF File. The options are: -i infilename - (optional) inputfilename, will read either ARGV files or from STDIN -o filename - the output filename [default STDOUT] -f format - search result format (blast, fasta,waba,axt) (ssearch is fasta format). default is blast. -t/--type seqtype - if you want to see query or hit information in the GFF report -s/--source - specify the source (will be algorithm name otherwise like BLASTN) --method - the method tag (primary_tag) of the features (default is similarity) --scorefunc - a string or a file that when parsed evaluates to a closure which will be passed a feature object and that returns the score to be printed --locfunc - a string or a file that when parsed evaluates to a closure which will be passed two features, query and hit, and returns the location (Bio::LocationI compliant) for the GFF3 feature created for each HSP; the closure may use the clone_loc() and create_loc() functions for convenience, see their PODs --onehsp - only print the first HSP feature for each hit -p/--parent - the parent to which HSP features should refer if not the name of the hit or query (depending on --type) --target/--notarget - whether to always add the Target tag or not -h - this help menu --version - GFF version to use (put a 3 here to use gff 3) --component - generate GFF component fields (chromosome) -m/--match - generate a 'match' line which is a container of all the similarity HSPs --addid - add ID tag in the absence of --match -c/--cutoff - specify an evalue cutoff Additionally specify the filenames you want to process on the command-line. If no files are specified then STDIN input is assumed. You specify this by doing: search2gff < file1 file2 file3 AUTHOR Jason Stajich, jason-at-bioperl-dot-org Contributors Hilmar Lapp, hlapp-at-gmx-dot-net clone_loc Title : clone_loc Usage : my $l = clone_loc($feature->location); Function: Helper function to simplify the task of cloning locations for --locfunc closures. Presently simply implemented using Storable::dclone(). Example : Returns : A L object of the same type and with the same properties as the argument, but physically different. All structured properties will be cloned as well. Args : A L compliant object create_loc Title : create_loc Usage : my $l = create_loc("10..12"); Function: Helper function to simplify the task of creating locations for --locfunc closures. Creates a location from a feature- table formatted string. Example : Returns : A L object representing the location given as formatted string. Args : A GenBank feature-table formatted string. From hlapp at gmx.net Thu Jan 19 18:06:57 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jan 19 20:24:31 2006 Subject: [Bioperl-l] search2gff Message-ID: I added a couple of capabilities to the scripts/utilities/search2gff script written by Jason. In a nutshell, there are now options for controlling the score, location, and method of the HSP-representing feature, as well as options for printing of parent, which parent, and whether to skip all except the first HSP for each hit. As for possible applications, for example using these options you can blast SNP assay primers and use the options to create SNP features for a single basepair at the end of the primer, ready to be piped to a GBrowse GFF3 loader. I tried to preserve the original functionality in its entirety, i.e., if you don't use any of the new options the script should work as before. If not please let me know. POD is attached. -hilmar -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- -------------- next part -------------- SYNOPSIS Usage: search2gff [-o outputfile] [-f reportformat] [-i inputfilename] OR file1 file2 .. DESCRIPTION This script will turn a protein Search report (BLASTP, FASTP, SSEARCH, AXT, WABA) into a GFF File. The options are: -i infilename - (optional) inputfilename, will read either ARGV files or from STDIN -o filename - the output filename [default STDOUT] -f format - search result format (blast, fasta,waba,axt) (ssearch is fasta format). default is blast. -t/--type seqtype - if you want to see query or hit information in the GFF report -s/--source - specify the source (will be algorithm name otherwise like BLASTN) --method - the method tag (primary_tag) of the features (default is similarity) --scorefunc - a string or a file that when parsed evaluates to a closure which will be passed a feature object and that returns the score to be printed --locfunc - a string or a file that when parsed evaluates to a closure which will be passed two features, query and hit, and returns the location (Bio::LocationI compliant) for the GFF3 feature created for each HSP; the closure may use the clone_loc() and create_loc() functions for convenience, see their PODs --onehsp - only print the first HSP feature for each hit -p/--parent - the parent to which HSP features should refer if not the name of the hit or query (depending on --type) --target/--notarget - whether to always add the Target tag or not -h - this help menu --version - GFF version to use (put a 3 here to use gff 3) --component - generate GFF component fields (chromosome) -m/--match - generate a 'match' line which is a container of all the similarity HSPs --addid - add ID tag in the absence of --match -c/--cutoff - specify an evalue cutoff Additionally specify the filenames you want to process on the command-line. If no files are specified then STDIN input is assumed. You specify this by doing: search2gff < file1 file2 file3 AUTHOR Jason Stajich, jason-at-bioperl-dot-org Contributors Hilmar Lapp, hlapp-at-gmx-dot-net clone_loc Title : clone_loc Usage : my $l = clone_loc($feature->location); Function: Helper function to simplify the task of cloning locations for --locfunc closures. Presently simply implemented using Storable::dclone(). Example : Returns : A L object of the same type and with the same properties as the argument, but physically different. All structured properties will be cloned as well. Args : A L compliant object create_loc Title : create_loc Usage : my $l = create_loc("10..12"); Function: Helper function to simplify the task of creating locations for --locfunc closures. Creates a location from a feature- table formatted string. Example : Returns : A L object representing the location given as formatted string. Args : A GenBank feature-table formatted string. From christoph.gille at charite.de Thu Jan 19 18:36:53 2006 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Thu Jan 19 22:12:06 2006 Subject: [Bioperl-l] bioperl Message-ID: <64335.84.190.29.176.1137713813.squirrel@webmail.charite.de> Hi Torsten, perhaps Sopma is not the best choice as a test case for bringing perl and java together. It is not a convincing example because people would ask why not contacting the server directly from java and why taking the hazzard with perl installation. I want to demonstrate that BioPerl programs can well work together with STRAP/Biojava with the wrapper I am just developing but I need a suitable example program. What I consider is a sophisticated non-interactive Bioperl program that performs some kind of useful computation on a protein sequence, or an alignment or a protein 3D structure. Do you know of something appropriate ? It does not matter if the program is complex or contains C/C++ as long as it can be automatically installed without user interaction. Many thanks Christoph From hlapp at gmx.net Thu Jan 19 17:53:40 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu Jan 19 23:20:02 2006 Subject: [Bioperl-l] OntologyTerm::as_text Message-ID: I changed the as_text() method of Bio::Annotation::OntologyTerm to not use the identifier of the term anymore but instead append the is_obsolete property. The reason is that a term's identifier is optional (though strongly recommended), and two terms (and therefore annotations) with the same name and ontology but one with identifier and the other without are considered equal annotations nonetheless. Please let me know if this creates a problem for anybody. I also had to fix a test in t/Annotation.t that assumed the identifier to be included in as_text. -hilmar -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From angshu96 at gmail.com Fri Jan 20 13:57:05 2006 From: angshu96 at gmail.com (Angshu Kar) Date: Fri Jan 20 15:34:33 2006 Subject: [Bioperl-l] 1 small help with WU-BLAST -postsw Message-ID: Hi, I'm using WU-BLASTP in yeast data (all vs all). Then I'm using : $hit_object->frac_aligned_query() > 0.5 and $hit_object->frac_aligned_hit() > 0.5 as filter conditions. In that I'm getting asymmetric results! I mean I've sequences A,B in my o/p and not B,A. Has it got something to do with the asymmetry of BLAST (but I thought -postsw takes care of that)? Please help. Thanks, Angshu -- Ignore the impossible but honor it ... The only enviable second position is success, since failure always comes first... From jason.stajich at duke.edu Fri Jan 20 16:02:22 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri Jan 20 17:16:55 2006 Subject: [Bioperl-l] 1 small help with WU-BLAST -postsw In-Reply-To: References: Message-ID: Well the hit and query probably are not the same length which is the denominator in that fraction .... On Jan 20, 2006, at 1:57 PM, Angshu Kar wrote: > Hi, > > I'm using WU-BLASTP in yeast data (all vs all). > > Then I'm using : > > $hit_object->frac_aligned_query() > 0.5 > and > $hit_object->frac_aligned_hit() > 0.5 > > as filter conditions. > > In that I'm getting asymmetric results! I mean I've sequences A,B in > my o/p and not B,A. Has it got something to do with the asymmetry of > BLAST (but I thought -postsw takes care of that)? > > Please help. > > Thanks, > Angshu > > > > > > > > -- > Ignore the impossible but honor it ... > The only enviable second position is success, since failure always > comes first... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From angshu96 at gmail.com Fri Jan 20 16:34:07 2006 From: angshu96 at gmail.com (Angshu Kar) Date: Fri Jan 20 20:18:04 2006 Subject: [Bioperl-l] 1 small help with WU-BLAST -postsw In-Reply-To: References: Message-ID: But I'm doing an AND.So if i do use alignment/hit > 0.5 and alignment/query > 0.5 as a filter will the length of the denominator matter? On 1/20/06, Jason Stajich wrote: > Well the hit and query probably are not the same length which is the > denominator in that fraction .... > > On Jan 20, 2006, at 1:57 PM, Angshu Kar wrote: > > > Hi, > > > > I'm using WU-BLASTP in yeast data (all vs all). > > > > Then I'm using : > > > > $hit_object->frac_aligned_query() > 0.5 > > and > > $hit_object->frac_aligned_hit() > 0.5 > > > > as filter conditions. > > > > In that I'm getting asymmetric results! I mean I've sequences A,B in > > my o/p and not B,A. Has it got something to do with the asymmetry of > > BLAST (but I thought -postsw takes care of that)? > > > > Please help. > > > > Thanks, > > Angshu > > > > > > > > > > > > > > > > -- > > Ignore the impossible but honor it ... > > The only enviable second position is success, since failure always > > comes first... > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > -- Ignore the impossible but honor it ... The only enviable second position is success, since failure always comes first... From jason.stajich at duke.edu Sat Jan 21 10:38:23 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Jan 21 10:36:11 2006 Subject: [Bioperl-l] 1 small help with WU-BLAST -postsw In-Reply-To: References: Message-ID: <1F723DFB-C817-411C-9E19-7183C8E3DF91@duke.edu> if you are really using Hit->frac_aligned I wouldn't rely on the Hit object frac_aligned - deal with the HSPs and calculate what you want - if there are multiple HSPs after a postsw some of those could be overalapping alternative sub-optimal alignments I don't know what thethe Hit object algorithm does when it tries to merge these examples. My best advice is - find concrete examples of A,B and B,A pairs that doesn't meet your filter - print out the HSP information and LOOK at the HSPs and calculate the expected numbers by hand to figure out what is going on - there are going to be comparisons where it is ambiguous what would you do BY HAND for these cases, ignore them because they are the 5th best hit? take the longest HSP? try and calculate overall coverage? - calculate frac aligned per HSP, decide what you want to do when there are multiple HSPs, take the longest, attempt to figure out some overall coverage, it all depends on your question if you are trying to find a single number - $hit_HSP_frac_aligned = $hsp->hit->length / $hit->length Good luck. -jason On Jan 20, 2006, at 4:34 PM, Angshu Kar wrote: > But I'm doing an AND.So if i do use alignment/hit > 0.5 and > alignment/query > 0.5 as a filter will the length of the denominator > matter? > > On 1/20/06, Jason Stajich wrote: >> Well the hit and query probably are not the same length which is the >> denominator in that fraction .... >> >> On Jan 20, 2006, at 1:57 PM, Angshu Kar wrote: >> >>> Hi, >>> >>> I'm using WU-BLASTP in yeast data (all vs all). >>> >>> Then I'm using : >>> >>> $hit_object->frac_aligned_query() > 0.5 >>> and >>> $hit_object->frac_aligned_hit() > 0.5 >>> >>> as filter conditions. >>> >>> In that I'm getting asymmetric results! I mean I've sequences A,B in >>> my o/p and not B,A. Has it got something to do with the asymmetry of >>> BLAST (but I thought -postsw takes care of that)? >>> >>> Please help. >>> >>> Thanks, >>> Angshu >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Ignore the impossible but honor it ... >>> The only enviable second position is success, since failure always >>> comes first... >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l@portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > > -- > Ignore the impossible but honor it ... > The only enviable second position is success, since failure always > comes first... -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From anst at kvl.dk Sat Jan 21 13:40:42 2006 From: anst at kvl.dk (Anders Stegmann) Date: Sat Jan 21 14:05:56 2006 Subject: [Bioperl-l] wrong nomatch position from protein with singly deleted Aa Message-ID: <43D28E3A0200009B0000069C@gwia.kvl.dk> Hi BioPerl! I have an original protein seq which I blastp (standalone) against the same seq with Aa nr 61 deleted manually. The result is that the subject nomatch is Aa. E on position 60, which is definitely not a mismatch!!? This also happens if I delete two Aa at positions 61 and 62 in the subject seq. This does strangely enough not happen if I delete a whole line (60 Aa) in the subject seq. The result for the query nomatch is Aa. V at position 61, which is korrekt (the subrutine code is similar to the subject code shown below). the code I use is following: sub subject_seq_alignment_nomatch_residues { my ($hsp_obj) = @_; my %subject_nomatch_hash = (); my @new_subject_string = (); my @subject_string = split , $$hsp_obj->hit_string; foreach (@subject_string) { #positioner i visse tilf?lde if ($_ ne '-') {push @new_subject_string, $_}; } my $start_subject_number = $$hsp_obj->start('hit'); $start_subject_number = $start_subject_number - 1; foreach ($$hsp_obj->seq_inds('hit', 'nomatch')) { $subject_nomatch_hash{$_} = $new_subject_string[$_ -1 -$start_subject_number];#positionen, tr?kker derefter den tilsvarende #aminosyre ud af subjekt sekvensen } return %subject_nomatch_hash; } It has nothing to do with the foreach (@subject_string) { code or the $start_subject_number (cause it is 0 in this example). I checked! How can this be? Regards Anders. From jason.stajich at duke.edu Sat Jan 21 14:29:21 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat Jan 21 14:25:31 2006 Subject: [Bioperl-l] wrong nomatch position from protein with singly deleted Aa In-Reply-To: <43D28E3A0200009B0000069C@gwia.kvl.dk> References: <43D28E3A0200009B0000069C@gwia.kvl.dk> Message-ID: I know you are trying to give an example but this isn't really enough for someone to help as you are referring to a sequence alignment we can't see. I can't tell if this a problem with translated blast coordinates, the seq_inds code alone, or what. Why don't you gather a sample report, the code you are using, and your expected result together in something that someone can run and submit it as a bug to bugzilla. Then hopefully someone from the community will download and reproduce your problem for you and can tell whether the problem is in the module or elsewhere. -jason On Jan 21, 2006, at 1:40 PM, Anders Stegmann wrote: > Hi BioPerl! > > I have an original protein seq which I blastp (standalone) against > the same seq with Aa nr 61 deleted manually. > > The result is that the subject nomatch is Aa. E on position 60, > which is definitely not a mismatch!!? > This also happens if I delete two Aa at positions 61 and 62 in the > subject seq. > This does strangely enough not happen if I delete a whole line (60 > Aa) in the subject seq. > > The result for the query nomatch is Aa. V at position 61, which is > korrekt (the subrutine code is similar to the subject code shown > below). > > > the code I use is following: > > sub subject_seq_alignment_nomatch_residues { > > my ($hsp_obj) = @_; > my %subject_nomatch_hash = (); > my @new_subject_string = (); > > my @subject_string = split , $$hsp_obj->hit_string; > > foreach (@subject_string) { #positioner i visse tilf?lde > > if ($_ ne '-') {push @new_subject_string, $_}; > > } > > my $start_subject_number = $$hsp_obj->start('hit'); > > $start_subject_number = $start_subject_number - 1; > > foreach ($$hsp_obj->seq_inds('hit', 'nomatch')) { > > $subject_nomatch_hash{$_} = $new_subject_string[$_ -1 - > $start_subject_number];#positionen, tr?kker derefter den > tilsvarende #aminosyre ud af subjekt sekvensen > } > > return %subject_nomatch_hash; > > } > > It has nothing to do with the foreach (@subject_string) { code or > the $start_subject_number (cause it is 0 in this example). I checked! > > How can this be? > > Regards Anders. > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From heikki at sanbi.ac.za Mon Jan 23 02:54:02 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Mon Jan 23 02:50:10 2006 Subject: [Bioperl-l] BioPerl Physical Map modules, tests needed Message-ID: <200601230954.02776.heikki@sanbi.ac.za> Dear Gaurav, I am going slowly through BioPerl modules that do not have any tests written to them. Unless there are tests in place, there is no way anyone can keep track of changes needed for the modules and useful modules will become obsolete as bioperl moves ahead. You have contributed the following modules: Bio::MapIO::fpc Bio::Map::Clone Bio::Map::Contig Bio::Map::FPCMarker Bio::Map::OrderedPositionWithDistance Bio::Map::Physical Would it be possible for you to write tests and add them, together with a small FPC sample file, to the repository. I gave it a shot (t/PhysicalMap.t), but had to stop short when I realised that Bio::MapIO::fpc reads in an FPC file and creates a Bio::Map::Physical by holding everything in an internal hash. None of the objects included in a physical map are instantiated before they are required. It is therefore very difficult for someone else to write meaningful tests without a good example FPC file. Yours, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From phil511 at 21cn.com Mon Jan 23 11:41:23 2006 From: phil511 at 21cn.com (Phil-) Date: Mon Jan 23 11:47:17 2006 Subject: [Bioperl-l] 2 questions about Bio::Tools::WebBlat Message-ID: <200601231639.k0NGdKc29777@taurus.zsu.edu.cn> Greetings, everyone! I'm an undergraduate from Sun Yat-Sen University of China. I am currently using bioperl to map some of my cDNA sequences onto human chromosomes. I used the Bio::Tools::WebBlat but something just goes wrong. Here comes my code: BEGIN{$ENV{HTTP_PROXY}='http://202.116.64.1:8001/';} use Bio::Tools::WebBlat; use Bio::Seq; my $webblat = Bio::Tools::WebBlat->new(); my $seq = Bio::Seq->new(-id => 'foo' , -seq => 'aataataat' ); my $searchio = $webblat->create_searchio(sequence=>$seq); while(my $result = $searchio->next_result){ sleep 1; } As you can see that I have to use a proxy. But i can't get the results by these codes. I further check the result of the LWP::UserAgent->request called inside WebBlat.pm and I think I found a mistake at the last line of code: $self->throw($ua->status_line); I think status_line should be a method of HTTP::Response but not LWP::UserAgent. I change the code and what I got is a '500 Internal Server Error', and with the following lines: ---------------------------- The page you requested resulted in a server problem on our systems. We hate this type of error immensely and we're sure that you do as well. While we have logged it and rapidly pursue any problems on our systems,sometimes extra information from the user can pinpoint the cause of the problem for us and help us prevent it in the future. If you have information that you would like to provide about what led to the error, please email us at genome-www@soe.ucsc.edu. If you are unable to access commonly-used features on our website, it is possible that you may need to reset your Genome Browser with the following URL: http://genome.ucsc.edu/cgi-bin/cartReset. This will replace your stored settings with the default configuration and will return your Browser to the state it was in when you first accessed it. We apologize for the inconvenience. ---------------------------- It seems that my script get contact with UCSC server but can't work though to get the result. What should I do? Thank you all! ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡Phil- ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡phil511@21cn.com ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡2006-01-24 From anst at kvl.dk Sun Jan 22 08:06:29 2006 From: anst at kvl.dk (Anders Stegmann) Date: Mon Jan 23 15:53:44 2006 Subject: [Bioperl-l] wrong nomatch position from protein with singly deleted Aa In-Reply-To: References: <43D28E3A0200009B0000069C@gwia.kvl.dk> Message-ID: <43D391650200009B000006B7@gwia.kvl.dk> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: blastp.pl Type: application/octet-stream Size: 22241 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060122/5a2eb143/blastp-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: YAL001C Type: application/octet-stream Size: 1529 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060122/5a2eb143/YAL001C-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: YAL001CDB Type: application/octet-stream Size: 1528 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060122/5a2eb143/YAL001CDB-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: YAL001Chit1hsp1 Type: application/octet-stream Size: 296 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060122/5a2eb143/YAL001Chit1hsp1-0001.obj -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20060122/5a2eb143/YAL001Chit1hsp1-0001.html From hubert.prielinger at gmx.at Mon Jan 23 16:18:48 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon Jan 23 17:18:27 2006 Subject: [Bioperl-l] formatdb with the nr database Message-ID: <43D54838.5050301@gmx.at> Hi, I have downloaded the nr database for doing a blast search locally, now I'm supposed to index the database with formatdb, but it doesn't work... The online help says that you need a fasta file that is indexed to use for searching the database, but when I uncompressed the zip file, there were only .phr, .pnd, .pin, .pni, .ppd file.... Is there anybody who can tell me, how to use formatdb with the nr database... Help is very appreciated Thank you very much in advance Hubert From smarkel at scitegic.com Mon Jan 23 17:53:43 2006 From: smarkel at scitegic.com (Scott Markel) Date: Mon Jan 23 18:02:09 2006 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D54838.5050301@gmx.at> References: <43D54838.5050301@gmx.at> Message-ID: <43D55E77.9040501@scitegic.com> Hubert, The .phr et al files are the result of already having run formatdb. By running NCBI's fastacmd (comes with blastall and formatdb) with the -D option, you can get back to a FASTA file. Scott Hubert Prielinger wrote: > Hi, > I have downloaded the nr database for doing a blast search locally, now > I'm supposed to index the database with formatdb, but it doesn't work... > The online help says that you need a fasta file that is indexed to use > for searching the database, but when I uncompressed the zip file, there > were only .phr, .pnd, .pin, .pni, .ppd file.... > Is there anybody who can tell me, how to use formatdb with the nr > database... > > Help is very appreciated > Thank you very much in advance > > Hubert > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From hubert.prielinger at gmx.at Mon Jan 23 18:08:51 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon Jan 23 19:01:50 2006 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D5693C.1020805@anu.edu.au> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> Message-ID: <43D56203.2060806@gmx.at> Hi, thank you very much for the help, another questions that raises up, do I have to write the path to the database files as well, I guess so, but how I do that, the same way I write the path to teh blast bin files? Does anybody know how to set the Composition based statistics parameter? there is my code: #!/usr/bin/perl -w use Bio::Tools::Run::StandAloneBlast; use Bio::Seq; use Bio::SeqIO; use strict; BEGIN { $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; } # parameters my $expect_value = 20000; #my $filter_query_sequence = 'F'; my $one_line_description = 1000; my $alignments = 1000; # my $strands = 1; my $count = 1; my @params = ('program' => 'blastp', 'database' => 'nr'); #my $progress_interval = 100; my $seqio_obj = Bio::SeqIO->new( -file => "Perm.txt", -format => "raw", ); # create factory object and set parameters my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $factory->e($expect_value); #$factory->F($filter_query_sequence); $factory->v($one_line_description); $factory->b($alignments); #$factory->S($strands); # get query while ( my $query = $seqio_obj->next_seq ) { my $blast_report = $factory->blastall($query); my $filename = "comp_$count.txt"; my $factory->outfile($filename); print $query->seq; print "\n"; $count++; } thank you very much in advance Hubert Nagesh Chakka wrote: > Hi Hubert, > I downloaded the nr.00.tar.gz file a week ago. I was able to get the > following files > .phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal files. I > have no trouble in running standalone blast. You are not required to > run formardb on the downloaded blast databases and that may be the > reason why the sequences are not included as it will also reduce the > size of the file. > Did you try to run a blast search, if so is it giving you any errors? > Nagesh > > > > Hubert Prielinger wrote: > >> Hi, >> I have downloaded the nr database for doing a blast search locally, >> now I'm supposed to index the database with formatdb, but it doesn't >> work... >> The online help says that you need a fasta file that is indexed to >> use for searching the database, but when I uncompressed the zip file, >> there were only .phr, .pnd, .pin, .pni, .ppd file.... >> Is there anybody who can tell me, how to use formatdb with the nr >> database... >> >> Help is very appreciated >> Thank you very much in advance >> >> Hubert >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > From hubert.prielinger at gmx.at Mon Jan 23 19:15:45 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon Jan 23 20:08:44 2006 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <1138062266.2534.2.camel@vogon> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> Message-ID: <43D571B1.3020008@gmx.at> Hi Nagesh, thank you very much, I put my database into the data folder, run the program and got the following error message: submit Sequence...just do it.... sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute binary file ------------- EXCEPTION ------------- MSG: blastall call crashed: 32256 /home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" -i /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b 1000 STACK Bio::Tools::Run::StandAloneBlast::_runblast /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 STACK Bio::Tools::Run::StandAloneBlast::blastall /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 STACK toplevel /home/Hubert/installed/eclipse/workspace/Database_Search/standalone_blast.pl:46 -------------------------------------- Why it did not find my binary file, but it is there regards Nagesh Chakka wrote: >Hi, >The following is from the StandAloneBlast.pm documentation >" If the databases which will be searched by BLAST are located in the >data subdirectory of the blast program directory (the default >installation location), StandAloneBlast will find them; however, if the >database files are located in any other location, environmental variable >$BLASTDATADIR will need to be set to point to that directory." >Please note that I have not used this module before. >Nagesh > > > >On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: > > >>Hi, >>thank you very much for the help, another questions that raises up, do I >>have to write the path to the database files as well, I guess so, but >>how I do that, the same way I write the path to teh blast bin files? >>Does anybody know how to set the Composition based statistics parameter? >>there is my code: >> >>#!/usr/bin/perl -w >> >>use Bio::Tools::Run::StandAloneBlast; >>use Bio::Seq; >>use Bio::SeqIO; >>use strict; >> >>BEGIN >>{ >> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; >>} >> >> >># parameters >>my $expect_value = 20000; >>#my $filter_query_sequence = 'F'; >>my $one_line_description = 1000; >>my $alignments = 1000; >># my $strands = 1; >>my $count = 1; >> >>my @params = ('program' => 'blastp', 'database' => 'nr'); >>#my $progress_interval = 100; >> >> >>my $seqio_obj = Bio::SeqIO->new( >> -file => "Perm.txt", >> -format => "raw", >>); >> >># create factory object and set parameters >>my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >> >>$factory->e($expect_value); >>#$factory->F($filter_query_sequence); >>$factory->v($one_line_description); >>$factory->b($alignments); >>#$factory->S($strands); >> >> >># get query >> >>while ( my $query = $seqio_obj->next_seq ) { >> my $blast_report = $factory->blastall($query); >> my $filename = "comp_$count.txt"; >> my $factory->outfile($filename); >> print $query->seq; >> print "\n"; >> >> $count++; >>} >> >>thank you very much in advance >>Hubert >> >> >> >>Nagesh Chakka wrote: >> >> >> >>>Hi Hubert, >>>I downloaded the nr.00.tar.gz file a week ago. I was able to get the >>>following files >>>.phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal files. I >>>have no trouble in running standalone blast. You are not required to >>>run formardb on the downloaded blast databases and that may be the >>>reason why the sequences are not included as it will also reduce the >>>size of the file. >>>Did you try to run a blast search, if so is it giving you any errors? >>>Nagesh >>> >>> >>> >>>Hubert Prielinger wrote: >>> >>> >>> >>>>Hi, >>>>I have downloaded the nr database for doing a blast search locally, >>>>now I'm supposed to index the database with formatdb, but it doesn't >>>>work... >>>>The online help says that you need a fasta file that is indexed to >>>>use for searching the database, but when I uncompressed the zip file, >>>>there were only .phr, .pnd, .pin, .pni, .ppd file.... >>>>Is there anybody who can tell me, how to use formatdb with the nr >>>>database... >>>> >>>>Help is very appreciated >>>>Thank you very much in advance >>>> >>>>Hubert >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l@portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> >>> >>> > > > > From smarkel at scitegic.com Mon Jan 23 20:47:38 2006 From: smarkel at scitegic.com (Scott Markel) Date: Mon Jan 23 20:45:56 2006 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D571B1.3020008@gmx.at> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> <43D571B1.3020008@gmx.at> Message-ID: <43D5873A.8050501@scitegic.com> Hubert, Does your blastall file have execute permission turned on? Scott Hubert Prielinger wrote: > Hi Nagesh, > thank you very much, I put my database into the data folder, run the > program and got the following error message: > > submit Sequence...just do it.... > sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute binary > file > > ------------- EXCEPTION ------------- > MSG: blastall call crashed: 32256 > /home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" -i > /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b 1000 > > STACK Bio::Tools::Run::StandAloneBlast::_runblast > /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > STACK Bio::Tools::Run::StandAloneBlast::blastall > /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 > STACK toplevel > /home/Hubert/installed/eclipse/workspace/Database_Search/standalone_blast.pl:46 > > > -------------------------------------- > > Why it did not find my binary file, but it is there > > regards > > Nagesh Chakka wrote: > >> Hi, >> The following is from the StandAloneBlast.pm documentation >> " If the databases which will be searched by BLAST are located in the >> data subdirectory of the blast program directory (the default >> installation location), StandAloneBlast will find them; however, if the >> database files are located in any other location, environmental variable >> $BLASTDATADIR will need to be set to point to that directory." >> Please note that I have not used this module before. >> Nagesh >> >> >> >> On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: >> >> >>> Hi, >>> thank you very much for the help, another questions that raises up, >>> do I have to write the path to the database files as well, I guess >>> so, but how I do that, the same way I write the path to teh blast bin >>> files? >>> Does anybody know how to set the Composition based statistics parameter? >>> there is my code: >>> >>> #!/usr/bin/perl -w >>> >>> use Bio::Tools::Run::StandAloneBlast; >>> use Bio::Seq; >>> use Bio::SeqIO; >>> use strict; >>> >>> BEGIN >>> { >>> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; >>> } >>> >>> >>> # parameters >>> my $expect_value = 20000; >>> #my $filter_query_sequence = 'F'; >>> my $one_line_description = 1000; >>> my $alignments = 1000; >>> # my $strands = 1; >>> my $count = 1; >>> >>> my @params = ('program' => 'blastp', 'database' => 'nr'); >>> #my $progress_interval = 100; >>> >>> >>> my $seqio_obj = Bio::SeqIO->new( >>> -file => "Perm.txt", >>> -format => "raw", >>> ); >>> >>> # create factory object and set parameters >>> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >>> >>> $factory->e($expect_value); >>> #$factory->F($filter_query_sequence); >>> $factory->v($one_line_description); >>> $factory->b($alignments); >>> #$factory->S($strands); >>> >>> >>> # get query >>> >>> while ( my $query = $seqio_obj->next_seq ) { >>> my $blast_report = $factory->blastall($query); >>> my $filename = "comp_$count.txt"; >>> my $factory->outfile($filename); >>> print $query->seq; >>> print "\n"; >>> >>> $count++; >>> } >>> >>> thank you very much in advance >>> Hubert >>> >>> >>> >>> Nagesh Chakka wrote: >>> >>> >>> >>>> Hi Hubert, >>>> I downloaded the nr.00.tar.gz file a week ago. I was able to get the >>>> following files >>>> .phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal files. I >>>> have no trouble in running standalone blast. You are not required to >>>> run formardb on the downloaded blast databases and that may be the >>>> reason why the sequences are not included as it will also reduce the >>>> size of the file. >>>> Did you try to run a blast search, if so is it giving you any errors? >>>> Nagesh >>>> >>>> >>>> >>>> Hubert Prielinger wrote: >>>> >>>> >>>> >>>>> Hi, >>>>> I have downloaded the nr database for doing a blast search locally, >>>>> now I'm supposed to index the database with formatdb, but it >>>>> doesn't work... >>>>> The online help says that you need a fasta file that is indexed to >>>>> use for searching the database, but when I uncompressed the zip >>>>> file, there were only .phr, .pnd, .pin, .pni, .ppd file.... >>>>> Is there anybody who can tell me, how to use formatdb with the nr >>>>> database... >>>>> >>>>> Help is very appreciated >>>>> Thank you very much in advance >>>>> >>>>> Hubert >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l@portal.open-bio.org >>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> >>>> >>>> >> >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From hubert.prielinger at gmx.at Mon Jan 23 20:02:10 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon Jan 23 20:55:07 2006 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D5873A.8050501@scitegic.com> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> <43D571B1.3020008@gmx.at> <43D5873A.8050501@scitegic.com> Message-ID: <43D57C92.5040905@gmx.at> Hi, yes all permissions are turned on Hubert Scott Markel wrote: > Hubert, > > Does your blastall file have execute permission turned on? > > Scott > > Hubert Prielinger wrote: > >> Hi Nagesh, >> thank you very much, I put my database into the data folder, run the >> program and got the following error message: >> >> submit Sequence...just do it.... >> sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute >> binary file >> >> ------------- EXCEPTION ------------- >> MSG: blastall call crashed: 32256 >> /home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" >> -i /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b 1000 >> >> STACK Bio::Tools::Run::StandAloneBlast::_runblast >> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >> STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast >> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >> STACK Bio::Tools::Run::StandAloneBlast::blastall >> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 >> STACK toplevel >> /home/Hubert/installed/eclipse/workspace/Database_Search/standalone_blast.pl:46 >> >> >> -------------------------------------- >> >> Why it did not find my binary file, but it is there >> >> regards >> >> Nagesh Chakka wrote: >> >>> Hi, >>> The following is from the StandAloneBlast.pm documentation >>> " If the databases which will be searched by BLAST are located in the >>> data subdirectory of the blast program directory (the default >>> installation location), StandAloneBlast will find them; however, if the >>> database files are located in any other location, environmental >>> variable >>> $BLASTDATADIR will need to be set to point to that directory." >>> Please note that I have not used this module before. >>> Nagesh >>> >>> >>> >>> On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: >>> >>> >>>> Hi, >>>> thank you very much for the help, another questions that raises up, >>>> do I have to write the path to the database files as well, I guess >>>> so, but how I do that, the same way I write the path to teh blast >>>> bin files? >>>> Does anybody know how to set the Composition based statistics >>>> parameter? >>>> there is my code: >>>> >>>> #!/usr/bin/perl -w >>>> >>>> use Bio::Tools::Run::StandAloneBlast; >>>> use Bio::Seq; >>>> use Bio::SeqIO; >>>> use strict; >>>> >>>> BEGIN >>>> { >>>> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; >>>> } >>>> >>>> >>>> # parameters >>>> my $expect_value = 20000; >>>> #my $filter_query_sequence = 'F'; >>>> my $one_line_description = 1000; >>>> my $alignments = 1000; >>>> # my $strands = 1; >>>> my $count = 1; >>>> >>>> my @params = ('program' => 'blastp', 'database' => 'nr'); >>>> #my $progress_interval = 100; >>>> >>>> >>>> my $seqio_obj = Bio::SeqIO->new( >>>> -file => "Perm.txt", >>>> -format => "raw", >>>> ); >>>> >>>> # create factory object and set parameters >>>> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >>>> >>>> $factory->e($expect_value); >>>> #$factory->F($filter_query_sequence); >>>> $factory->v($one_line_description); >>>> $factory->b($alignments); >>>> #$factory->S($strands); >>>> >>>> >>>> # get query >>>> >>>> while ( my $query = $seqio_obj->next_seq ) { >>>> my $blast_report = $factory->blastall($query); >>>> my $filename = "comp_$count.txt"; >>>> my $factory->outfile($filename); >>>> print $query->seq; >>>> print "\n"; >>>> >>>> $count++; >>>> } >>>> >>>> thank you very much in advance >>>> Hubert >>>> >>>> >>>> >>>> Nagesh Chakka wrote: >>>> >>>> >>>> >>>>> Hi Hubert, >>>>> I downloaded the nr.00.tar.gz file a week ago. I was able to get >>>>> the following files >>>>> .phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal files. >>>>> I have no trouble in running standalone blast. You are not >>>>> required to run formardb on the downloaded blast databases and >>>>> that may be the reason why the sequences are not included as it >>>>> will also reduce the size of the file. >>>>> Did you try to run a blast search, if so is it giving you any errors? >>>>> Nagesh >>>>> >>>>> >>>>> >>>>> Hubert Prielinger wrote: >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> I have downloaded the nr database for doing a blast search >>>>>> locally, now I'm supposed to index the database with formatdb, >>>>>> but it doesn't work... >>>>>> The online help says that you need a fasta file that is indexed >>>>>> to use for searching the database, but when I uncompressed the >>>>>> zip file, there were only .phr, .pnd, .pin, .pni, .ppd file.... >>>>>> Is there anybody who can tell me, how to use formatdb with the nr >>>>>> database... >>>>>> >>>>>> Help is very appreciated >>>>>> Thank you very much in advance >>>>>> >>>>>> Hubert >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@portal.open-bio.org >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l@portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > From hubert.prielinger at gmx.at Mon Jan 23 20:41:35 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon Jan 23 21:34:47 2006 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D58D06.5080501@anu.edu.au> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> <43D571B1.3020008@gmx.at> <43D58D06.5080501@anu.edu.au> Message-ID: <43D585CF.5070902@gmx.at> hi, sorry, but what do you mean with is your blast database in /nr... my database is located in the path /home/Hubert/blast/blast-2.2.13/data Nagesh Chakka wrote: > Can you just run the blast from the command line. > Is your blast database in "/nr". > > Hubert Prielinger wrote: > >> Hi Nagesh, >> thank you very much, I put my database into the data folder, run the >> program and got the following error message: >> >> submit Sequence...just do it.... >> sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute >> binary file >> >> ------------- EXCEPTION ------------- >> MSG: blastall call crashed: 32256 >> /home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" >> -i /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b 1000 >> >> STACK Bio::Tools::Run::StandAloneBlast::_runblast >> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >> STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast >> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >> STACK Bio::Tools::Run::StandAloneBlast::blastall >> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 >> STACK toplevel >> /home/Hubert/installed/eclipse/workspace/Database_Search/standalone_blast.pl:46 >> >> >> -------------------------------------- >> >> Why it did not find my binary file, but it is there >> >> regards >> >> Nagesh Chakka wrote: >> >>> Hi, >>> The following is from the StandAloneBlast.pm documentation >>> " If the databases which will be searched by BLAST are located in the >>> data subdirectory of the blast program directory (the default >>> installation location), StandAloneBlast will find them; however, if the >>> database files are located in any other location, environmental >>> variable >>> $BLASTDATADIR will need to be set to point to that directory." >>> Please note that I have not used this module before. >>> Nagesh >>> >>> >>> >>> On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: >>> >>> >>>> Hi, >>>> thank you very much for the help, another questions that raises up, >>>> do I have to write the path to the database files as well, I guess >>>> so, but how I do that, the same way I write the path to teh blast >>>> bin files? >>>> Does anybody know how to set the Composition based statistics >>>> parameter? >>>> there is my code: >>>> >>>> #!/usr/bin/perl -w >>>> >>>> use Bio::Tools::Run::StandAloneBlast; >>>> use Bio::Seq; >>>> use Bio::SeqIO; >>>> use strict; >>>> >>>> BEGIN >>>> { >>>> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; >>>> } >>>> >>>> >>>> # parameters >>>> my $expect_value = 20000; >>>> #my $filter_query_sequence = 'F'; >>>> my $one_line_description = 1000; >>>> my $alignments = 1000; >>>> # my $strands = 1; >>>> my $count = 1; >>>> >>>> my @params = ('program' => 'blastp', 'database' => 'nr'); >>>> #my $progress_interval = 100; >>>> >>>> >>>> my $seqio_obj = Bio::SeqIO->new( >>>> -file => "Perm.txt", >>>> -format => "raw", >>>> ); >>>> >>>> # create factory object and set parameters >>>> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >>>> >>>> $factory->e($expect_value); >>>> #$factory->F($filter_query_sequence); >>>> $factory->v($one_line_description); >>>> $factory->b($alignments); >>>> #$factory->S($strands); >>>> >>>> >>>> # get query >>>> >>>> while ( my $query = $seqio_obj->next_seq ) { >>>> my $blast_report = $factory->blastall($query); >>>> my $filename = "comp_$count.txt"; >>>> my $factory->outfile($filename); >>>> print $query->seq; >>>> print "\n"; >>>> >>>> $count++; >>>> } >>>> >>>> thank you very much in advance >>>> Hubert >>>> >>>> >>>> >>>> Nagesh Chakka wrote: >>>> >>>> >>>> >>>>> Hi Hubert, >>>>> I downloaded the nr.00.tar.gz file a week ago. I was able to get >>>>> the following files >>>>> .phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal files. >>>>> I have no trouble in running standalone blast. You are not >>>>> required to run formardb on the downloaded blast databases and >>>>> that may be the reason why the sequences are not included as it >>>>> will also reduce the size of the file. >>>>> Did you try to run a blast search, if so is it giving you any errors? >>>>> Nagesh >>>>> >>>>> >>>>> >>>>> Hubert Prielinger wrote: >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> I have downloaded the nr database for doing a blast search >>>>>> locally, now I'm supposed to index the database with formatdb, >>>>>> but it doesn't work... >>>>>> The online help says that you need a fasta file that is indexed >>>>>> to use for searching the database, but when I uncompressed the >>>>>> zip file, there were only .phr, .pnd, .pin, .pni, .ppd file.... >>>>>> Is there anybody who can tell me, how to use formatdb with the nr >>>>>> database... >>>>>> >>>>>> Help is very appreciated >>>>>> Thank you very much in advance >>>>>> >>>>>> Hubert >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l@portal.open-bio.org >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> >> > > From taerwin at gmail.com Mon Jan 23 18:21:03 2006 From: taerwin at gmail.com (Tim Erwin) Date: Mon Jan 23 21:57:43 2006 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D54838.5050301@gmx.at> References: <43D54838.5050301@gmx.at> Message-ID: The .phr, .pnd, .pin, .pni, .ppd files are the indexed database, you don't need to run formatdb as this step has already been done. If you run formatdb on a fasta file it will generate these .p* files for a protein database and .n* files for a nucleotide database. Regards, Tim On 1/24/06, Hubert Prielinger wrote: > Hi, > I have downloaded the nr database for doing a blast search locally, now > I'm supposed to index the database with formatdb, but it doesn't work... > The online help says that you need a fasta file that is indexed to use > for searching the database, but when I uncompressed the zip file, there > were only .phr, .pnd, .pin, .pni, .ppd file.... > Is there anybody who can tell me, how to use formatdb with the nr > database... > > Help is very appreciated > Thank you very much in advance > > Hubert > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Mon Jan 23 22:57:47 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Mon Jan 23 22:54:13 2006 Subject: [Bioperl-l] bioperl In-Reply-To: <64335.84.190.29.176.1137713813.squirrel@webmail.charite.de> References: <64335.84.190.29.176.1137713813.squirrel@webmail.charite.de> Message-ID: On Jan 19, 2006, at 6:36 PM, Dr. Christoph Gille wrote: > Hi Torsten, > > perhaps Sopma is not the best choice as a test case for bringing > perl and > java together. It is not a convincing example because people would > ask why > not > contacting the server directly from java and why taking the hazzard > with > perl installation. > > I want to demonstrate that BioPerl programs can well work together > with > STRAP/Biojava with the wrapper I am just developing but I need a > suitable > example program. > > What I consider is a sophisticated non-interactive Bioperl program > that > performs some kind of useful computation on a protein sequence, or an > alignment or a protein 3D structure. > > Do you know of something appropriate ? > > It does not matter if the program is complex or contains C/C++ as > long as > it can be automatically installed without user interaction. PAML/Codeml, PHYLIP programs Neighbor, Seqboot, ProtDist, ProtPars, require some file formatting that but might still be criticized as not sufficiently difficult to brave the hazards of installing perl modules. Simpler things like MUSCLE, TCOFFEE, BLAST, FASTA/SSEARCH might also be good choices. > > Many thanks > > Christoph > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From nagesh.chakka at anu.edu.au Tue Jan 24 03:00:02 2006 From: nagesh.chakka at anu.edu.au (Nagesh Chakka) Date: Tue Jan 24 03:22:15 2006 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D585CF.5070902@gmx.at> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> <43D571B1.3020008@gmx.at> <43D58D06.5080501@anu.edu.au> <43D585CF.5070902@gmx.at> Message-ID: <1138089602.3643.1.camel@vogon> I could get the following code working. The only problem I had was with using the method outfile which I defined differently. #!/usr/local/bin/perl -w BEGIN { $ENV{BLASTDIR}="/usr/local/blast/bin"; $ENV{BLASTDATADIR}= "/home/nagesh/blast/nr.00"; } use Bio::Tools::Run::StandAloneBlast; use Bio::Seq; use Bio::SeqIO; use strict; # parameters my $expect_value = 20000; #my $filter_query_sequence = 'F'; my $one_line_description = 1000; my $alignments = 1000; # my $strands = 1; my $count = 1; my @params = ('program' => 'blastp','database' => 'nr.00', 'outfile' => 'temp.out'); #my $progress_interval = 100; my $seqio_obj = Bio::SeqIO->new( -file => "blastInput.txt", -format => "fasta", ); # create factory object and set parameters my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); $factory->e($expect_value); #$factory->F($filter_query_sequence); $factory->v($one_line_description); $factory->b($alignments); #$factory->S($strands); # get query while ( my $query = $seqio_obj->next_seq) { my $blast_report = $factory->blastall($query); print "$blast_report\n"; # $factory->outfile("temp.out"); print $query->seq; print "\n"; $count++; } On Mon, 2006-01-23 at 19:41 -0600, Hubert Prielinger wrote: > hi, > sorry, but what do you mean with is your blast database in /nr... > my database is located in the path /home/Hubert/blast/blast-2.2.13/data > > > > Nagesh Chakka wrote: > > > Can you just run the blast from the command line. > > Is your blast database in "/nr". > > > > Hubert Prielinger wrote: > > > >> Hi Nagesh, > >> thank you very much, I put my database into the data folder, run the > >> program and got the following error message: > >> > >> submit Sequence...just do it.... > >> sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute > >> binary file > >> > >> ------------- EXCEPTION ------------- > >> MSG: blastall call crashed: 32256 > >> /home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" > >> -i /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b 1000 > >> > >> STACK Bio::Tools::Run::StandAloneBlast::_runblast > >> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 > >> STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast > >> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 > >> STACK Bio::Tools::Run::StandAloneBlast::blastall > >> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 > >> STACK toplevel > >> /home/Hubert/installed/eclipse/workspace/Database_Search/standalone_blast.pl:46 > >> > >> > >> -------------------------------------- > >> > >> Why it did not find my binary file, but it is there > >> > >> regards > >> > >> Nagesh Chakka wrote: > >> > >>> Hi, > >>> The following is from the StandAloneBlast.pm documentation > >>> " If the databases which will be searched by BLAST are located in the > >>> data subdirectory of the blast program directory (the default > >>> installation location), StandAloneBlast will find them; however, if the > >>> database files are located in any other location, environmental > >>> variable > >>> $BLASTDATADIR will need to be set to point to that directory." > >>> Please note that I have not used this module before. > >>> Nagesh > >>> > >>> > >>> > >>> On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: > >>> > >>> > >>>> Hi, > >>>> thank you very much for the help, another questions that raises up, > >>>> do I have to write the path to the database files as well, I guess > >>>> so, but how I do that, the same way I write the path to teh blast > >>>> bin files? > >>>> Does anybody know how to set the Composition based statistics > >>>> parameter? > >>>> there is my code: > >>>> > >>>> #!/usr/bin/perl -w > >>>> > >>>> use Bio::Tools::Run::StandAloneBlast; > >>>> use Bio::Seq; > >>>> use Bio::SeqIO; > >>>> use strict; > >>>> > >>>> BEGIN > >>>> { > >>>> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; > >>>> } > >>>> > >>>> > >>>> # parameters > >>>> my $expect_value = 20000; > >>>> #my $filter_query_sequence = 'F'; > >>>> my $one_line_description = 1000; > >>>> my $alignments = 1000; > >>>> # my $strands = 1; > >>>> my $count = 1; > >>>> > >>>> my @params = ('program' => 'blastp', 'database' => 'nr'); > >>>> #my $progress_interval = 100; > >>>> > >>>> > >>>> my $seqio_obj = Bio::SeqIO->new( > >>>> -file => "Perm.txt", > >>>> -format => "raw", > >>>> ); > >>>> > >>>> # create factory object and set parameters > >>>> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); > >>>> > >>>> $factory->e($expect_value); > >>>> #$factory->F($filter_query_sequence); > >>>> $factory->v($one_line_description); > >>>> $factory->b($alignments); > >>>> #$factory->S($strands); > >>>> > >>>> > >>>> # get query > >>>> > >>>> while ( my $query = $seqio_obj->next_seq ) { > >>>> my $blast_report = $factory->blastall($query); > >>>> my $filename = "comp_$count.txt"; > >>>> my $factory->outfile($filename); > >>>> print $query->seq; > >>>> print "\n"; > >>>> > >>>> $count++; > >>>> } > >>>> > >>>> thank you very much in advance > >>>> Hubert > >>>> > >>>> > >>>> > >>>> Nagesh Chakka wrote: > >>>> > >>>> > >>>> > >>>>> Hi Hubert, > >>>>> I downloaded the nr.00.tar.gz file a week ago. I was able to get > >>>>> the following files > >>>>> .phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal files. > >>>>> I have no trouble in running standalone blast. You are not > >>>>> required to run formardb on the downloaded blast databases and > >>>>> that may be the reason why the sequences are not included as it > >>>>> will also reduce the size of the file. > >>>>> Did you try to run a blast search, if so is it giving you any errors? > >>>>> Nagesh > >>>>> > >>>>> > >>>>> > >>>>> Hubert Prielinger wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Hi, > >>>>>> I have downloaded the nr database for doing a blast search > >>>>>> locally, now I'm supposed to index the database with formatdb, > >>>>>> but it doesn't work... > >>>>>> The online help says that you need a fasta file that is indexed > >>>>>> to use for searching the database, but when I uncompressed the > >>>>>> zip file, there were only .phr, .pnd, .pin, .pni, .ppd file.... > >>>>>> Is there anybody who can tell me, how to use formatdb with the nr > >>>>>> database... > >>>>>> > >>>>>> Help is very appreciated > >>>>>> Thank you very much in advance > >>>>>> > >>>>>> Hubert > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l@portal.open-bio.org > >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >>> > >>> > >> > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From mith at ceh.ac.uk Tue Jan 24 04:16:49 2006 From: mith at ceh.ac.uk (Milo Thurston) Date: Tue Jan 24 04:27:54 2006 Subject: [Bioperl-l] .tab + FASTA -> EMBL Message-ID: <200601240916.k0O9Gnnd011814@ivpcl10.nox.ac.uk> Hello, Would anyone be able to suggest a suitable method for the following, please? I have a load of FASTA sequences, and for each one several Artemis feature tables and MSP Crunch files. I'd like to read in each sequence plus the annotations, combine them and save as EMBL format. Converting from FASTA to EMBL is, of course, trivial but I can't find any existing Bioperl modules that might deal with the .tabs, and I'd rather use what's available than duplicate code to read them. Thanks. -- Dr. Milo Thurston, CEH Oxford, Mansfield Road, Oxford, OX1 3SR. 'phone 01865 281975, fax 01865 281696. http://www.genomics.ceh.ac.uk/lab/ From smarkel at scitegic.com Tue Jan 24 09:54:46 2006 From: smarkel at scitegic.com (Scott Markel) Date: Tue Jan 24 09:51:39 2006 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D585CF.5070902@gmx.at> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> <43D571B1.3020008@gmx.at> <43D58D06.5080501@anu.edu.au> <43D585CF.5070902@gmx.at> Message-ID: <43D63FB6.4090505@scitegic.com> Hubert, If you look at the MSG line in the exception you can see exactly what the command line was. Nagesh is pointing out that you used -d "/nr" and asking if that's what you want. I suspect that the '/' shouldn't be there. Try invoking blastall directly from the command line. All BioPerl is doing is invoking BLAST on your behalf. The same command line that BioPerl uses should also work for you on the command line. Scott Hubert Prielinger wrote: > hi, > sorry, but what do you mean with is your blast database in /nr... > my database is located in the path /home/Hubert/blast/blast-2.2.13/data > > > > Nagesh Chakka wrote: > >> Can you just run the blast from the command line. >> Is your blast database in "/nr". >> >> Hubert Prielinger wrote: >> >>> Hi Nagesh, >>> thank you very much, I put my database into the data folder, run the >>> program and got the following error message: >>> >>> submit Sequence...just do it.... >>> sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute >>> binary file >>> >>> ------------- EXCEPTION ------------- >>> MSG: blastall call crashed: 32256 >>> /home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" >>> -i /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b 1000 >>> >>> STACK Bio::Tools::Run::StandAloneBlast::_runblast >>> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >>> STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast >>> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >>> STACK Bio::Tools::Run::StandAloneBlast::blastall >>> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 >>> STACK toplevel >>> /home/Hubert/installed/eclipse/workspace/Database_Search/standalone_blast.pl:46 >>> >>> >>> -------------------------------------- >>> >>> Why it did not find my binary file, but it is there >>> >>> regards >>> >>> Nagesh Chakka wrote: >>> >>>> Hi, >>>> The following is from the StandAloneBlast.pm documentation >>>> " If the databases which will be searched by BLAST are located in the >>>> data subdirectory of the blast program directory (the default >>>> installation location), StandAloneBlast will find them; however, if the >>>> database files are located in any other location, environmental >>>> variable >>>> $BLASTDATADIR will need to be set to point to that directory." >>>> Please note that I have not used this module before. >>>> Nagesh >>>> >>>> >>>> >>>> On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: >>>> >>>> >>>>> Hi, >>>>> thank you very much for the help, another questions that raises up, >>>>> do I have to write the path to the database files as well, I guess >>>>> so, but how I do that, the same way I write the path to teh blast >>>>> bin files? >>>>> Does anybody know how to set the Composition based statistics >>>>> parameter? >>>>> there is my code: >>>>> >>>>> #!/usr/bin/perl -w >>>>> >>>>> use Bio::Tools::Run::StandAloneBlast; >>>>> use Bio::Seq; >>>>> use Bio::SeqIO; >>>>> use strict; >>>>> >>>>> BEGIN >>>>> { >>>>> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; >>>>> } >>>>> >>>>> >>>>> # parameters >>>>> my $expect_value = 20000; >>>>> #my $filter_query_sequence = 'F'; >>>>> my $one_line_description = 1000; >>>>> my $alignments = 1000; >>>>> # my $strands = 1; >>>>> my $count = 1; >>>>> >>>>> my @params = ('program' => 'blastp', 'database' => 'nr'); >>>>> #my $progress_interval = 100; >>>>> >>>>> >>>>> my $seqio_obj = Bio::SeqIO->new( >>>>> -file => "Perm.txt", >>>>> -format => "raw", >>>>> ); >>>>> >>>>> # create factory object and set parameters >>>>> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >>>>> >>>>> $factory->e($expect_value); >>>>> #$factory->F($filter_query_sequence); >>>>> $factory->v($one_line_description); >>>>> $factory->b($alignments); >>>>> #$factory->S($strands); >>>>> >>>>> >>>>> # get query >>>>> >>>>> while ( my $query = $seqio_obj->next_seq ) { >>>>> my $blast_report = $factory->blastall($query); >>>>> my $filename = "comp_$count.txt"; >>>>> my $factory->outfile($filename); >>>>> print $query->seq; >>>>> print "\n"; >>>>> >>>>> $count++; >>>>> } >>>>> >>>>> thank you very much in advance >>>>> Hubert >>>>> >>>>> >>>>> >>>>> Nagesh Chakka wrote: >>>>> >>>>> >>>>> >>>>>> Hi Hubert, >>>>>> I downloaded the nr.00.tar.gz file a week ago. I was able to get >>>>>> the following files >>>>>> .phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal files. >>>>>> I have no trouble in running standalone blast. You are not >>>>>> required to run formardb on the downloaded blast databases and >>>>>> that may be the reason why the sequences are not included as it >>>>>> will also reduce the size of the file. >>>>>> Did you try to run a blast search, if so is it giving you any errors? >>>>>> Nagesh >>>>>> >>>>>> >>>>>> >>>>>> Hubert Prielinger wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> I have downloaded the nr database for doing a blast search >>>>>>> locally, now I'm supposed to index the database with formatdb, >>>>>>> but it doesn't work... >>>>>>> The online help says that you need a fasta file that is indexed >>>>>>> to use for searching the database, but when I uncompressed the >>>>>>> zip file, there were only .phr, .pnd, .pin, .pni, .ppd file.... >>>>>>> Is there anybody who can tell me, how to use formatdb with the nr >>>>>>> database... >>>>>>> >>>>>>> Help is very appreciated >>>>>>> Thank you very much in advance >>>>>>> >>>>>>> Hubert >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l@portal.open-bio.org >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel@scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From cain at cshl.edu Tue Jan 24 11:16:23 2006 From: cain at cshl.edu (Scott Cain) Date: Tue Jan 24 11:29:02 2006 Subject: [Bioperl-l] Re: [Gmod-gbrowse] GMOD PPM repository not working In-Reply-To: <000001c61c4f$7d835170$15327e82@pyrimidine> References: <000001c61c4f$7d835170$15327e82@pyrimidine> Message-ID: <1138119383.3338.68.camel@localhost.localdomain> Hi Chris, Is it still misbehaving? I'll do some testing today, but my ability to do so is little hampered as I am traveling this week. Thanks, Scott On Wed, 2006-01-18 at 10:51 -0600, Chris Fields wrote: > Scott, > > I am trying to find the newest bioperl dev. Release (1.51) from PPM for a > quick write-up on installing bioperl-db on Windows. I tried using the GMOD > repository: > > ppm> rep add gmod http://www.gmod.org/ggb/ppm > Repositories: > [1] gmod > [ ] ActiveState Package Repository > [ ] ActiveState PPM2 Repository > [ ] Bioperl > [ ] Bribes > [ ] Kobes > [ ] local > ppm> search bioperl > Searching in Active Repositories > No matches for 'bioperl'; see 'help search'. > ppm> search * > Searching in Active Repositories > No matches for '*'; see 'help search'. > ppm> > > > Any idea what's going on? All other repositories work fine. I can download > it and install locally w/o a problem. I am running the newest ActivePerl > (5.8.7.815), WinXP. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain at cshl.edu Tue Jan 24 11:33:18 2006 From: cain at cshl.edu (Scott Cain) Date: Tue Jan 24 11:29:13 2006 Subject: RE [Gmod-gbrowse] [Fwd: [Bioperl-l] search2gff] In-Reply-To: References: Message-ID: <1138120399.3338.77.camel@localhost.localdomain> Hello Dea, If there were a bioperl parser for Geneseqer output, it probably wouldn't be that hard to write one, but as far as I can tell there isn't a parser (a quick grep through bioperl-live came up empty). Sorry, Scott On Tue, 2006-01-24 at 17:19 +0100, dea.giardella@biogemma.com wrote: > Hello, > > In the same way are there any scripts to convert Geneseqer output in GGF3 > format ? > Geneseqer : http://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/PlantGDBgs.cgi > > Thanks a lot ! > > D?a GIARDELLA > dea.giardella@biogemma.com > > > > Scott Cain > Envoy? par : gmod-gbrowse-admin@lists.sourceforge.net > 24/01/2006 16:26 > > A > "Gbrowse (E-mail)" > cc > > Objet > [Gmod-gbrowse] [Fwd: [Bioperl-l] search2gff] > > > > > > > Hello all, > > Hilmar Lapp posted the attached message to the bioperl mailing list > about search2gff, a script for converting BLAST output to GFF3. I > thought it might be of interest to readers of this mailing list as well. > > Scott > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain@cshl.edu > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > ----- Message de Hilmar Lapp sur Thu, 19 Jan 2006 15:11:22 > -0800 ----- > Pour: > bioperl-l > Objet: > [Bioperl-l] search2gff > I added a couple of capabilities to the scripts/utilities/search2gff > script written by Jason. In a nutshell, there are now options for > controlling the score, location, and method of the HSP-representing > feature, as well as options for printing of parent, which parent, and > whether to skip all except the first HSP for each hit. > > As for possible applications, for example using these options you can > blast SNP assay primers and use the options to create SNP features for > a single basepair at the end of the primer, ready to be piped to a > GBrowse GFF3 loader. > > I tried to preserve the original functionality in its entirety, i.e., > if you don't use any of the new options the script should work as > before. If not please let me know. > > POD is attached. > > -hilmar > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > SYNOPSIS > Usage: search2gff [-o outputfile] [-f reportformat] [-i inputfilename] > OR file1 file2 .. > > DESCRIPTION > This script will turn a protein Search report (BLASTP, FASTP, SSEARCH, > AXT, WABA) into a GFF File. > > The options are: > > -i infilename - (optional) inputfilename, will read > either ARGV files or from STDIN > -o filename - the output filename [default STDOUT] > -f format - search result format (blast, fasta,waba,axt) > (ssearch is fasta format). default is blast. > -t/--type seqtype - if you want to see query or hit information > in the GFF report > -s/--source - specify the source (will be algorithm name > otherwise like BLASTN) > --method - the method tag (primary_tag) of the features > (default is similarity) > --scorefunc - a string or a file that when parsed evaluates > to a closure which will be passed a feature > object and that returns the score to be > printed > --locfunc - a string or a file that when parsed evaluates > to a closure which will be passed two > features, query and hit, and returns the > location (Bio::LocationI compliant) for the > GFF3 feature created for each HSP; the closure > may use the clone_loc() and create_loc() > functions for convenience, see their PODs > --onehsp - only print the first HSP feature for each hit > -p/--parent - the parent to which HSP features should refer > if not the name of the hit or query (depending > on --type) > --target/--notarget - whether to always add the Target tag or not > -h - this help menu > --version - GFF version to use (put a 3 here to use gff 3) > --component - generate GFF component fields (chromosome) > -m/--match - generate a 'match' line which is a container > of all the similarity HSPs > --addid - add ID tag in the absence of --match > -c/--cutoff - specify an evalue cutoff > > Additionally specify the filenames you want to process on the > command-line. If no files are specified then STDIN input is assumed. > You > specify this by doing: search2gff < file1 file2 file3 > > AUTHOR > Jason Stajich, jason-at-bioperl-dot-org > > Contributors > Hilmar Lapp, hlapp-at-gmx-dot-net > > clone_loc > Title : clone_loc > Usage : my $l = clone_loc($feature->location); > Function: Helper function to simplify the task of cloning locations > for --locfunc closures. > > Presently simply implemented using Storable::dclone(). > Example : > Returns : A L object of the same type and with the > same properties as the argument, but physically different. > All structured properties will be cloned as well. > Args : A L compliant object > > create_loc > Title : create_loc > Usage : my $l = create_loc("10..12"); > Function: Helper function to simplify the task of creating locations > for --locfunc closures. Creates a location from a feature- > table formatted string. > > Example : > Returns : A L object representing the location given > as formatted string. > Args : A GenBank feature-table formatted string. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd_______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Tue Jan 24 12:09:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue Jan 24 12:08:27 2006 Subject: [Bioperl-l] RemoteBlast.pm and Bio::SearchIO::blast.pm - partially resolved Message-ID: <000901c62109$01814870$15327e82@pyrimidine> I submitted two bugs on Bugzilla to describe recent problems with RemoteBlast.pm and SearchIO::blast.pm http://bugzilla.bioperl.org/show_bug.cgi?id=1934 http://bugzilla.bioperl.org/show_bug.cgi?id=1935 Today I submitted a patched version of Bio::SearchIO::blast.pm which should fix the text parsing issue for old (2.2.12) and new (2.2.13) versions of NCBI's BLAST; the bug link above describes the problem and the fix. Problem is, I know it will likely break again b/c NCBI will probably change text output in a future BLAST version. I also agree with Jason about changing the default for SearchIO to XML. So, does text output parsing through blast.pm need to be deprecated in favor of XML, or should both be available? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From jason.stajich at duke.edu Tue Jan 24 12:15:47 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue Jan 24 12:12:12 2006 Subject: [Bioperl-l] Re: RemoteBlast.pm and Bio::SearchIO::blast.pm - partially resolved In-Reply-To: <000901c62109$01814870$15327e82@pyrimidine> References: <000901c62109$01814870$15327e82@pyrimidine> Message-ID: <18966F80-B780-4661-953E-613B05B56164@duke.edu> Thanks Chris - I don't know when I'll have time to check in bugs so anyone else who has commit access feel free to give these a whirl and check in. I would propose making the XML default but allowing the text version to still be supported in the event that someone has setup their own local NCBI BLAST Web interface which still supports the simple Text output. -j On Jan 24, 2006, at 12:09 PM, Chris Fields wrote: > I submitted two bugs on Bugzilla to describe recent problems with > RemoteBlast.pm and SearchIO::blast.pm > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > Today I submitted a patched version of Bio::SearchIO::blast.pm > which should > fix the text parsing issue for old (2.2.12) and new (2.2.13) > versions of > NCBI's BLAST; the bug link above describes the problem and the > fix. Problem > is, I know it will likely break again b/c NCBI will probably change > text > output in a future BLAST version. I also agree with Jason about > changing > the default for SearchIO to XML. So, does text output parsing through > blast.pm need to be deprecated in favor of XML, or should both be > available? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue Jan 24 12:33:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue Jan 24 12:42:56 2006 Subject: [Bioperl-l] RE: RemoteBlast.pm and Bio::SearchIO::blast.pm - partially resolved In-Reply-To: <18966F80-B780-4661-953E-613B05B56164@duke.edu> Message-ID: <000d01c6210c$3e7c1040$15327e82@pyrimidine> I wouldn't mind helping out in maintaining blast.pm or RemoteBlast.pm, but I'm still a bit 'green' with Perl and Bioperl objects and methods. This last fix was somewhat easy to spot (simple regex); the problems with saving XML output (bug #1935) are a stumbling block here, though. A new wrinkle though, which limits the bug's severity: it does at least parse the XML output as it will pull out accession numbers, which is a bit of a relief (blastxml seems to be working). It just won't save it, and using $result->query_name still gives part of the RID, suggesting a regex messing up somewhere, maybe in blastxml. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich@duke.edu] > Sent: Tuesday, January 24, 2006 11:16 AM > To: Chris Fields > Cc: bioperl-ml List > Subject: Re: RemoteBlast.pm and Bio::SearchIO::blast.pm - partially > resolved > > Thanks Chris - I don't know when I'll have time to check in bugs so > anyone else who has commit access feel free to give these a whirl and > check in. > > I would propose making the XML default but allowing the text version > to still be supported in the event that someone has setup their own > local NCBI BLAST Web interface which still supports the simple Text > output. > > -j > > On Jan 24, 2006, at 12:09 PM, Chris Fields wrote: > > > I submitted two bugs on Bugzilla to describe recent problems with > > RemoteBlast.pm and SearchIO::blast.pm > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > Today I submitted a patched version of Bio::SearchIO::blast.pm > > which should > > fix the text parsing issue for old (2.2.12) and new (2.2.13) > > versions of > > NCBI's BLAST; the bug link above describes the problem and the > > fix. Problem > > is, I know it will likely break again b/c NCBI will probably change > > text > > output in a future BLAST version. I also agree with Jason about > > changing > > the default for SearchIO to XML. So, does text output parsing through > > blast.pm need to be deprecated in favor of XML, or should both be > > available? > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From hubert.prielinger at gmx.at Tue Jan 24 15:49:07 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Tue, 24 Jan 2006 14:49:07 -0600 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D63FB6.4090505@scitegic.com> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> <43D571B1.3020008@gmx.at> <43D58D06.5080501@anu.edu.au> <43D585CF.5070902@gmx.at> <43D63FB6.4090505@scitegic.com> Message-ID: <43D692C3.80306@gmx.at> Hi, thank you very much for the help, I have tried to run the blastall on commandline, but I can't even execute the binary file, nevertheless the blastall exe file have every permission... I always get the error message: blastall: cannot execute the binary file Need to be the exe file somewhere else, another path...now it is located under /home/Hubert/blast/blast-2.2.13/bin thanks Hubert Scott Markel wrote: > Hubert, > > If you look at the MSG line in the exception you can see > exactly what the command line was. Nagesh is pointing out > that you used -d "/nr" and asking if that's what you want. > I suspect that the '/' shouldn't be there. > > Try invoking blastall directly from the command line. All > BioPerl is doing is invoking BLAST on your behalf. The > same command line that BioPerl uses should also work for > you on the command line. > > Scott > > Hubert Prielinger wrote: > >> hi, >> sorry, but what do you mean with is your blast database in /nr... >> my database is located in the path /home/Hubert/blast/blast-2.2.13/data >> >> >> >> Nagesh Chakka wrote: >> >>> Can you just run the blast from the command line. >>> Is your blast database in "/nr". >>> >>> Hubert Prielinger wrote: >>> >>>> Hi Nagesh, >>>> thank you very much, I put my database into the data folder, run >>>> the program and got the following error message: >>>> >>>> submit Sequence...just do it.... >>>> sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute >>>> binary file >>>> >>>> ------------- EXCEPTION ------------- >>>> MSG: blastall call crashed: 32256 >>>> /home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" >>>> -i /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b >>>> 1000 >>>> >>>> STACK Bio::Tools::Run::StandAloneBlast::_runblast >>>> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >>>> STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast >>>> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >>>> STACK Bio::Tools::Run::StandAloneBlast::blastall >>>> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 >>>> STACK toplevel >>>> /home/Hubert/installed/eclipse/workspace/Database_Search/standalone_blast.pl:46 >>>> >>>> >>>> -------------------------------------- >>>> >>>> Why it did not find my binary file, but it is there >>>> >>>> regards >>>> >>>> Nagesh Chakka wrote: >>>> >>>>> Hi, >>>>> The following is from the StandAloneBlast.pm documentation >>>>> " If the databases which will be searched by BLAST are located in the >>>>> data subdirectory of the blast program directory (the default >>>>> installation location), StandAloneBlast will find them; however, >>>>> if the >>>>> database files are located in any other location, environmental >>>>> variable >>>>> $BLASTDATADIR will need to be set to point to that directory." >>>>> Please note that I have not used this module before. >>>>> Nagesh >>>>> >>>>> >>>>> >>>>> On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: >>>>> >>>>> >>>>>> Hi, >>>>>> thank you very much for the help, another questions that raises >>>>>> up, do I have to write the path to the database files as well, I >>>>>> guess so, but how I do that, the same way I write the path to teh >>>>>> blast bin files? >>>>>> Does anybody know how to set the Composition based statistics >>>>>> parameter? >>>>>> there is my code: >>>>>> >>>>>> #!/usr/bin/perl -w >>>>>> >>>>>> use Bio::Tools::Run::StandAloneBlast; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use strict; >>>>>> >>>>>> BEGIN >>>>>> { >>>>>> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; >>>>>> } >>>>>> >>>>>> >>>>>> # parameters >>>>>> my $expect_value = 20000; >>>>>> #my $filter_query_sequence = 'F'; >>>>>> my $one_line_description = 1000; >>>>>> my $alignments = 1000; >>>>>> # my $strands = 1; >>>>>> my $count = 1; >>>>>> >>>>>> my @params = ('program' => 'blastp', 'database' => 'nr'); >>>>>> #my $progress_interval = 100; >>>>>> >>>>>> >>>>>> my $seqio_obj = Bio::SeqIO->new( >>>>>> -file => "Perm.txt", >>>>>> -format => "raw", >>>>>> ); >>>>>> >>>>>> # create factory object and set parameters >>>>>> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >>>>>> >>>>>> $factory->e($expect_value); >>>>>> #$factory->F($filter_query_sequence); >>>>>> $factory->v($one_line_description); >>>>>> $factory->b($alignments); >>>>>> #$factory->S($strands); >>>>>> >>>>>> >>>>>> # get query >>>>>> >>>>>> while ( my $query = $seqio_obj->next_seq ) { >>>>>> my $blast_report = $factory->blastall($query); >>>>>> my $filename = "comp_$count.txt"; >>>>>> my $factory->outfile($filename); >>>>>> print $query->seq; >>>>>> print "\n"; >>>>>> >>>>>> $count++; >>>>>> } >>>>>> >>>>>> thank you very much in advance >>>>>> Hubert >>>>>> >>>>>> >>>>>> >>>>>> Nagesh Chakka wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi Hubert, >>>>>>> I downloaded the nr.00.tar.gz file a week ago. I was able to get >>>>>>> the following files >>>>>>> .phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal >>>>>>> files. I have no trouble in running standalone blast. You are >>>>>>> not required to run formardb on the downloaded blast databases >>>>>>> and that may be the reason why the sequences are not included as >>>>>>> it will also reduce the size of the file. >>>>>>> Did you try to run a blast search, if so is it giving you any >>>>>>> errors? >>>>>>> Nagesh >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hubert Prielinger wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> I have downloaded the nr database for doing a blast search >>>>>>>> locally, now I'm supposed to index the database with formatdb, >>>>>>>> but it doesn't work... >>>>>>>> The online help says that you need a fasta file that is indexed >>>>>>>> to use for searching the database, but when I uncompressed the >>>>>>>> zip file, there were only .phr, .pnd, .pin, .pni, .ppd file.... >>>>>>>> Is there anybody who can tell me, how to use formatdb with the >>>>>>>> nr database... >>>>>>>> >>>>>>>> Help is very appreciated >>>>>>>> Thank you very much in advance >>>>>>>> >>>>>>>> Hubert >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at portal.open-bio.org >>>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l >> >> > From hubert.prielinger at gmx.at Tue Jan 24 16:15:38 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Tue, 24 Jan 2006 15:15:38 -0600 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D6B09A.3040207@atgc.org> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> <43D571B1.3020008@gmx.at> <43D58D06.5080501@anu.edu.au> <43D585CF.5070902@gmx.at> <43D63FB6.4090505@scitegic.com> <43D692C3.80306@gmx.at> <43D6B09A.3040207@atgc.org> Message-ID: <43D698FA.3090904@gmx.at> hi alex, I have done, as you recommended and got the following output: [Hubert at ppc7 ~]$ file /home/Hubert/blast/blast-2.2.13/bin/blastall /home/Hubert/blast/blast-2.2.13/bin/blastall: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, not stripped [Hubert at ppc7 ~]$ does it mean, that it is compatible with the operating system thanks for help Hubert Alexander Kozik wrote: > try Unix command "file", for example: > > > bash-2.03$ file /usr/local/genome/bin/blastall > > /usr/local/genome/bin/blastall: ELF 64-bit MSB executable SPARCV9 > Version 1, UltraSPARC1 Extensions Required, dynamically linked, stripped > > bash-2.03$ > > it will tell if it's compatible with the operating system > > -Alex > > Hubert Prielinger wrote: > >>Hi, >>thank you very much for the help, I have tried to run the blastall on >>commandline, but I can't even execute the binary file, nevertheless the >>blastall exe file have every permission... >>I always get the error message: blastall: cannot execute the binary file >>Need to be the exe file somewhere else, another path...now it is located >>under /home/Hubert/blast/blast-2.2.13/bin >> >>thanks >>Hubert >> >> >> >> >> >>Scott Markel wrote: >> >> >> >>>Hubert, >>> >>>If you look at the MSG line in the exception you can see >>>exactly what the command line was. Nagesh is pointing out >>>that you used -d "/nr" and asking if that's what you want. >>>I suspect that the '/' shouldn't be there. >>> >>>Try invoking blastall directly from the command line. All >>>BioPerl is doing is invoking BLAST on your behalf. The >>>same command line that BioPerl uses should also work for >>>you on the command line. >>> >>>Scott >>> >>>Hubert Prielinger wrote: >>> >>> >>> >>>>hi, >>>>sorry, but what do you mean with is your blast database in /nr... >>>>my database is located in the path /home/Hubert/blast/blast-2.2.13/data >>>> >>>> >>>> >>>>Nagesh Chakka wrote: >>>> >>>> >>>> >>>>>Can you just run the blast from the command line. >>>>>Is your blast database in "/nr". >>>>> >>>>>Hubert Prielinger wrote: >>>>> >>>>> >>>>> >>>>>>Hi Nagesh, >>>>>>thank you very much, I put my database into the data folder, run >>>>>>the program and got the following error message: >>>>>> >>>>>>submit Sequence...just do it.... >>>>>>sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute >>>>>>binary file >>>>>> >>>>>>------------- EXCEPTION ------------- >>>>>>MSG: blastall call crashed: 32256 >>>>>>/home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" >>>>>>-i /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b >>>>>>1000 >>>>>> >>>>>>STACK Bio::Tools::Run::StandAloneBlast::_runblast >>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >>>>>>STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast >>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >>>>>>STACK Bio::Tools::Run::StandAloneBlast::blastall >>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 >>>>>>STACK toplevel >>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/standalo >>>>>>ne_blast.pl:46 >>>>>> >>>>>> >>>>>>-------------------------------------- >>>>>> >>>>>>Why it did not find my binary file, but it is there >>>>>> >>>>>>regards >>>>>> >>>>>>Nagesh Chakka wrote: >>>>>> >>>>>> >>>>>> >>>>>>>Hi, >>>>>>>The following is from the StandAloneBlast.pm documentation >>>>>>>" If the databases which will be searched by BLAST are located in the >>>>>>>data subdirectory of the blast program directory (the default >>>>>>>installation location), StandAloneBlast will find them; however, >>>>>>>if the >>>>>>>database files are located in any other location, environmental >>>>>>>variable >>>>>>>$BLASTDATADIR will need to be set to point to that directory." >>>>>>>Please note that I have not used this module before. >>>>>>>Nagesh >>>>>>> >>>>>>> >>>>>>> >>>>>>>On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>Hi, >>>>>>>>thank you very much for the help, another questions that raises >>>>>>>>up, do I have to write the path to the database files as well, I >>>>>>>>guess so, but how I do that, the same way I write the path to teh >>>>>>>>blast bin files? >>>>>>>>Does anybody know how to set the Composition based statistics >>>>>>>>parameter? >>>>>>>>there is my code: >>>>>>>> >>>>>>>>#!/usr/bin/perl -w >>>>>>>> >>>>>>>>use Bio::Tools::Run::StandAloneBlast; >>>>>>>>use Bio::Seq; >>>>>>>>use Bio::SeqIO; >>>>>>>>use strict; >>>>>>>> >>>>>>>>BEGIN >>>>>>>>{ >>>>>>>> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; >>>>>>>>} >>>>>>>> >>>>>>>> >>>>>>>># parameters >>>>>>>>my $expect_value = 20000; >>>>>>>>#my $filter_query_sequence = 'F'; >>>>>>>>my $one_line_description = 1000; >>>>>>>>my $alignments = 1000; >>>>>>>># my $strands = 1; >>>>>>>>my $count = 1; >>>>>>>> >>>>>>>>my @params = ('program' => 'blastp', 'database' => 'nr'); >>>>>>>>#my $progress_interval = 100; >>>>>>>> >>>>>>>> >>>>>>>>my $seqio_obj = Bio::SeqIO->new( >>>>>>>> -file => "Perm.txt", >>>>>>>> -format => "raw", >>>>>>>>); >>>>>>>> >>>>>>>># create factory >>>>>>>> object and set parameters >>>>>>>>my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >>>>>>>> >>>>>>>>$factory->e($expect_value); >>>>>>>>#$factory->F($filter_query_sequence); >>>>>>>>$factory->v($one_line_description); >>>>>>>>$factory->b($alignments); >>>>>>>>#$factory->S($strands); >>>>>>>> >>>>>>>> >>>>>>>># get query >>>>>>>> >>>>>>>>while ( my $query = $seqio_obj->next_seq ) { >>>>>>>> my $blast_report = $factory->blastall($query); >>>>>>>> my $filename = "comp_$count.txt"; >>>>>>>> my $factory->outfile($filename); >>>>>>>> print $query->seq; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> $count++; >>>>>>>>} >>>>>>>> >>>>>>>>thank you very much in advance >>>>>>>>Hubert >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Nagesh Chakka wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Hi Hubert, >>>>>>>>>I downloaded the nr.00.tar.gz file a week ago. I was able to get >>>>>>>>>the following files >>>>>>>>>.phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal >>>>>>>>>files. I have no trouble in running standalone blast. You are >>>>>>>>>not required to run formardb on the downloaded blast databases >>>>>>>>>and that may be the reason why the sequences are not included as >>>>>>>>>it will also reduce the size of the file. >>>>>>>>>Did you try to run a blast search, if so is it giving you any >>>>>>>>>errors? >>>>>>>>>Nagesh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>Hubert Prielinger wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>Hi, >>>>>>>>>>I have downloaded the nr database for doing a blast search >>>>>>>>>>locally, now I'm supposed to index the database with formatdb, >>>>>>>>>>but it doesn't work... >>>>>>>>>>The online help says that you need a fasta file that is indexed >>>>>>>>>>to use for searching the database, but when I uncompressed the >>>>>>>>>>zip file, there were only .phr, .pnd, .pin, .pni, .ppd file.... >>>>>>>>>>Is there anybody who can tell me, how to use formatdb with the >>>>>>>>>>nr database... >>>>>>>>>> >>>>>>>>>>Help is very appreciated >>>>>>>>>>Thank you very much in advance >>>>>>>>>> >>>>>>>>>>Hubert >>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>Bioperl-l at portal.open-bio.org >>>>>>>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From hubert.prielinger at gmx.at Tue Jan 24 16:24:51 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Tue, 24 Jan 2006 15:24:51 -0600 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D6B09A.3040207@atgc.org> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> <43D571B1.3020008@gmx.at> <43D58D06.5080501@anu.edu.au> <43D585CF.5070902@gmx.at> <43D63FB6.4090505@scitegic.com> <43D692C3.80306@gmx.at> <43D6B09A.3040207@atgc.org> Message-ID: <43D69B23.9010100@gmx.at> Hi, I'm very sorry for wasting your time, but I just figured out what happend, I have installed the 64 bit version and not the 32 bit version.... sorry for the inconvenience and thanks for the help.... I'm trying to fix now the problem with the database.... Sorry Hubert Alexander Kozik wrote: > try Unix command "file", for example: > > > bash-2.03$ file /usr/local/genome/bin/blastall > > /usr/local/genome/bin/blastall: ELF 64-bit MSB executable SPARCV9 > Version 1, UltraSPARC1 Extensions Required, dynamically linked, stripped > > bash-2.03$ > > it will tell if it's compatible with the operating system > > -Alex > > Hubert Prielinger wrote: > >>Hi, >>thank you very much for the help, I have tried to run the blastall on >>commandline, but I can't even execute the binary file, nevertheless the >>blastall exe file have every permission... >>I always get the error message: blastall: cannot execute the binary file >>Need to be the exe file somewhere else, another path...now it is located >>under /home/Hubert/blast/blast-2.2.13/bin >> >>thanks >>Hubert >> >> >> >> >> >>Scott Markel wrote: >> >> >> >>>Hubert, >>> >>>If you look at the MSG line in the exception you can see >>>exactly what the command line was. Nagesh is pointing out >>>that you used -d "/nr" and asking if that's what you want. >>>I suspect that the '/' shouldn't be there. >>> >>>Try invoking blastall directly from the command line. All >>>BioPerl is doing is invoking BLAST on your behalf. The >>>same command line that BioPerl uses should also work for >>>you on the command line. >>> >>>Scott >>> >>>Hubert Prielinger wrote: >>> >>> >>> >>>>hi, >>>>sorry, but what do you mean with is your blast database in /nr... >>>>my database is located in the path /home/Hubert/blast/blast-2.2.13/data >>>> >>>> >>>> >>>>Nagesh Chakka wrote: >>>> >>>> >>>> >>>>>Can you just run the blast from the command line. >>>>>Is your blast database in "/nr". >>>>> >>>>>Hubert Prielinger wrote: >>>>> >>>>> >>>>> >>>>>>Hi Nagesh, >>>>>>thank you very much, I put my database into the data folder, run >>>>>>the program and got the following error message: >>>>>> >>>>>>submit Sequence...just do it.... >>>>>>sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute >>>>>>binary file >>>>>> >>>>>>------------- EXCEPTION ------------- >>>>>>MSG: blastall call crashed: 32256 >>>>>>/home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" >>>>>>-i /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b >>>>>>1000 >>>>>> >>>>>>STACK Bio::Tools::Run::StandAloneBlast::_runblast >>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >>>>>>STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast >>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >>>>>>STACK Bio::Tools::Run::StandAloneBlast::blastall >>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 >>>>>>STACK toplevel >>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/standalo >>>>>>ne_blast.pl:46 >>>>>> >>>>>> >>>>>>-------------------------------------- >>>>>> >>>>>>Why it did not find my binary file, but it is there >>>>>> >>>>>>regards >>>>>> >>>>>>Nagesh Chakka wrote: >>>>>> >>>>>> >>>>>> >>>>>>>Hi, >>>>>>>The following is from the StandAloneBlast.pm documentation >>>>>>>" If the databases which will be searched by BLAST are located in the >>>>>>>data subdirectory of the blast program directory (the default >>>>>>>installation location), StandAloneBlast will find them; however, >>>>>>>if the >>>>>>>database files are located in any other location, environmental >>>>>>>variable >>>>>>>$BLASTDATADIR will need to be set to point to that directory." >>>>>>>Please note that I have not used this module before. >>>>>>>Nagesh >>>>>>> >>>>>>> >>>>>>> >>>>>>>On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>Hi, >>>>>>>>thank you very much for the help, another questions that raises >>>>>>>>up, do I have to write the path to the database files as well, I >>>>>>>>guess so, but how I do that, the same way I write the path to teh >>>>>>>>blast bin files? >>>>>>>>Does anybody know how to set the Composition based statistics >>>>>>>>parameter? >>>>>>>>there is my code: >>>>>>>> >>>>>>>>#!/usr/bin/perl -w >>>>>>>> >>>>>>>>use Bio::Tools::Run::StandAloneBlast; >>>>>>>>use Bio::Seq; >>>>>>>>use Bio::SeqIO; >>>>>>>>use strict; >>>>>>>> >>>>>>>>BEGIN >>>>>>>>{ >>>>>>>> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; >>>>>>>>} >>>>>>>> >>>>>>>> >>>>>>>># parameters >>>>>>>>my $expect_value = 20000; >>>>>>>>#my $filter_query_sequence = 'F'; >>>>>>>>my $one_line_description = 1000; >>>>>>>>my $alignments = 1000; >>>>>>>># my $strands = 1; >>>>>>>>my $count = 1; >>>>>>>> >>>>>>>>my @params = ('program' => 'blastp', 'database' => 'nr'); >>>>>>>>#my $progress_interval = 100; >>>>>>>> >>>>>>>> >>>>>>>>my $seqio_obj = Bio::SeqIO->new( >>>>>>>> -file => "Perm.txt", >>>>>>>> -format => "raw", >>>>>>>>); >>>>>>>> >>>>>>>># create factory >>>>>>>> object and set parameters >>>>>>>>my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >>>>>>>> >>>>>>>>$factory->e($expect_value); >>>>>>>>#$factory->F($filter_query_sequence); >>>>>>>>$factory->v($one_line_description); >>>>>>>>$factory->b($alignments); >>>>>>>>#$factory->S($strands); >>>>>>>> >>>>>>>> >>>>>>>># get query >>>>>>>> >>>>>>>>while ( my $query = $seqio_obj->next_seq ) { >>>>>>>> my $blast_report = $factory->blastall($query); >>>>>>>> my $filename = "comp_$count.txt"; >>>>>>>> my $factory->outfile($filename); >>>>>>>> print $query->seq; >>>>>>>> print "\n"; >>>>>>>> >>>>>>>> $count++; >>>>>>>>} >>>>>>>> >>>>>>>>thank you very much in advance >>>>>>>>Hubert >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Nagesh Chakka wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Hi Hubert, >>>>>>>>>I downloaded the nr.00.tar.gz file a week ago. I was able to get >>>>>>>>>the following files >>>>>>>>>.phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal >>>>>>>>>files. I have no trouble in running standalone blast. You are >>>>>>>>>not required to run formardb on the downloaded blast databases >>>>>>>>>and that may be the reason why the sequences are not included as >>>>>>>>>it will also reduce the size of the file. >>>>>>>>>Did you try to run a blast search, if so is it giving you any >>>>>>>>>errors? >>>>>>>>>Nagesh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>Hubert Prielinger wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>Hi, >>>>>>>>>>I have downloaded the nr database for doing a blast search >>>>>>>>>>locally, now I'm supposed to index the database with formatdb, >>>>>>>>>>but it doesn't work... >>>>>>>>>>The online help says that you need a fasta file that is indexed >>>>>>>>>>to use for searching the database, but when I uncompressed the >>>>>>>>>>zip file, there were only .phr, .pnd, .pin, .pni, .ppd file.... >>>>>>>>>>Is there anybody who can tell me, how to use formatdb with the >>>>>>>>>>nr database... >>>>>>>>>> >>>>>>>>>>Help is very appreciated >>>>>>>>>>Thank you very much in advance >>>>>>>>>> >>>>>>>>>>Hubert >>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>Bioperl-l at portal.open-bio.org >>>>>>>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From smarkel at scitegic.com Tue Jan 24 17:09:57 2006 From: smarkel at scitegic.com (Scott Markel) Date: Tue, 24 Jan 2006 14:09:57 -0800 Subject: [Bioperl-l] formatdb with the nr database In-Reply-To: <43D692C3.80306@gmx.at> References: <43D54838.5050301@gmx.at> <43D5693C.1020805@anu.edu.au> <43D56203.2060806@gmx.at> <1138062266.2534.2.camel@vogon> <43D571B1.3020008@gmx.at> <43D58D06.5080501@anu.edu.au> <43D585CF.5070902@gmx.at> <43D63FB6.4090505@scitegic.com> <43D692C3.80306@gmx.at> Message-ID: <43D6A5B5.8090106@scitegic.com> Hubert, Since you can't run blastall on the command line, your initial problem has nothing to do with BioPerl. Once you get blastall working on the command line, you'll know what directories and environment variable settings to use when running via BioPerl. What happens when you run the following? file /home/Hubert/blast/blast-2.2.13/bin/blastall Is the executable the correct one for your operating system? Scott Hubert Prielinger wrote: > Hi, > thank you very much for the help, I have tried to run the blastall on > commandline, but I can't even execute the binary file, nevertheless the > blastall exe file have every permission... > I always get the error message: blastall: cannot execute the binary file > Need to be the exe file somewhere else, another path...now it is located > under /home/Hubert/blast/blast-2.2.13/bin > > thanks > Hubert > > > > > > Scott Markel wrote: > >> Hubert, >> >> If you look at the MSG line in the exception you can see >> exactly what the command line was. Nagesh is pointing out >> that you used -d "/nr" and asking if that's what you want. >> I suspect that the '/' shouldn't be there. >> >> Try invoking blastall directly from the command line. All >> BioPerl is doing is invoking BLAST on your behalf. The >> same command line that BioPerl uses should also work for >> you on the command line. >> >> Scott >> >> Hubert Prielinger wrote: >> >>> hi, >>> sorry, but what do you mean with is your blast database in /nr... >>> my database is located in the path /home/Hubert/blast/blast-2.2.13/data >>> >>> >>> >>> Nagesh Chakka wrote: >>> >>>> Can you just run the blast from the command line. >>>> Is your blast database in "/nr". >>>> >>>> Hubert Prielinger wrote: >>>> >>>>> Hi Nagesh, >>>>> thank you very much, I put my database into the data folder, run >>>>> the program and got the following error message: >>>>> >>>>> submit Sequence...just do it.... >>>>> sh: /home/Hubert/blast/blast-2.2.13/bin/blastall: cannot execute >>>>> binary file >>>>> >>>>> ------------- EXCEPTION ------------- >>>>> MSG: blastall call crashed: 32256 >>>>> /home/Hubert/blast/blast-2.2.13/bin/blastall -p blastp -d "/nr" >>>>> -i /tmp/QTZfYMbgLM -e 20000 -o /tmp/v3YwWvONZ1 -v 1000 -b >>>>> 1000 >>>>> >>>>> STACK Bio::Tools::Run::StandAloneBlast::_runblast >>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759 >>>>> STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast >>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706 >>>>> STACK Bio::Tools::Run::StandAloneBlast::blastall >>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:557 >>>>> STACK toplevel >>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/standalone_blast.pl:46 >>>>> >>>>> >>>>> -------------------------------------- >>>>> >>>>> Why it did not find my binary file, but it is there >>>>> >>>>> regards >>>>> >>>>> Nagesh Chakka wrote: >>>>> >>>>>> Hi, >>>>>> The following is from the StandAloneBlast.pm documentation >>>>>> " If the databases which will be searched by BLAST are located in the >>>>>> data subdirectory of the blast program directory (the default >>>>>> installation location), StandAloneBlast will find them; however, >>>>>> if the >>>>>> database files are located in any other location, environmental >>>>>> variable >>>>>> $BLASTDATADIR will need to be set to point to that directory." >>>>>> Please note that I have not used this module before. >>>>>> Nagesh >>>>>> >>>>>> >>>>>> >>>>>> On Mon, 2006-01-23 at 17:08 -0600, Hubert Prielinger wrote: >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> thank you very much for the help, another questions that raises >>>>>>> up, do I have to write the path to the database files as well, I >>>>>>> guess so, but how I do that, the same way I write the path to teh >>>>>>> blast bin files? >>>>>>> Does anybody know how to set the Composition based statistics >>>>>>> parameter? >>>>>>> there is my code: >>>>>>> >>>>>>> #!/usr/bin/perl -w >>>>>>> >>>>>>> use Bio::Tools::Run::StandAloneBlast; >>>>>>> use Bio::Seq; >>>>>>> use Bio::SeqIO; >>>>>>> use strict; >>>>>>> >>>>>>> BEGIN >>>>>>> { >>>>>>> $ENV{PATH}=":/home/Hubert/blast/blast-2.2.13/bin/:"; >>>>>>> } >>>>>>> >>>>>>> >>>>>>> # parameters >>>>>>> my $expect_value = 20000; >>>>>>> #my $filter_query_sequence = 'F'; >>>>>>> my $one_line_description = 1000; >>>>>>> my $alignments = 1000; >>>>>>> # my $strands = 1; >>>>>>> my $count = 1; >>>>>>> >>>>>>> my @params = ('program' => 'blastp', 'database' => 'nr'); >>>>>>> #my $progress_interval = 100; >>>>>>> >>>>>>> >>>>>>> my $seqio_obj = Bio::SeqIO->new( >>>>>>> -file => "Perm.txt", >>>>>>> -format => "raw", >>>>>>> ); >>>>>>> >>>>>>> # create factory object and set parameters >>>>>>> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); >>>>>>> >>>>>>> $factory->e($expect_value); >>>>>>> #$factory->F($filter_query_sequence); >>>>>>> $factory->v($one_line_description); >>>>>>> $factory->b($alignments); >>>>>>> #$factory->S($strands); >>>>>>> >>>>>>> >>>>>>> # get query >>>>>>> >>>>>>> while ( my $query = $seqio_obj->next_seq ) { >>>>>>> my $blast_report = $factory->blastall($query); >>>>>>> my $filename = "comp_$count.txt"; >>>>>>> my $factory->outfile($filename); >>>>>>> print $query->seq; >>>>>>> print "\n"; >>>>>>> >>>>>>> $count++; >>>>>>> } >>>>>>> >>>>>>> thank you very much in advance >>>>>>> Hubert >>>>>>> >>>>>>> >>>>>>> >>>>>>> Nagesh Chakka wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi Hubert, >>>>>>>> I downloaded the nr.00.tar.gz file a week ago. I was able to get >>>>>>>> the following files >>>>>>>> .phr, .pin, .pnd, .pni, .ppd, .ppi, .psd, .psi, .psq, .pal >>>>>>>> files. I have no trouble in running standalone blast. You are >>>>>>>> not required to run formardb on the downloaded blast databases >>>>>>>> and that may be the reason why the sequences are not included as >>>>>>>> it will also reduce the size of the file. >>>>>>>> Did you try to run a blast search, if so is it giving you any >>>>>>>> errors? >>>>>>>> Nagesh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hubert Prielinger wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> I have downloaded the nr database for doing a blast search >>>>>>>>> locally, now I'm supposed to index the database with formatdb, >>>>>>>>> but it doesn't work... >>>>>>>>> The online help says that you need a fasta file that is indexed >>>>>>>>> to use for searching the database, but when I uncompressed the >>>>>>>>> zip file, there were only .phr, .pnd, .pin, .pni, .ppd file.... >>>>>>>>> Is there anybody who can tell me, how to use formatdb with the >>>>>>>>> nr database... >>>>>>>>> >>>>>>>>> Help is very appreciated >>>>>>>>> Thank you very much in advance >>>>>>>>> >>>>>>>>> Hubert >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at portal.open-bio.org >>>>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > > > > -- Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at scitegic.com SciTegic Inc. mobile: +1 858 205 3653 9665 Chesapeake Drive, Suite 401 voice: +1 858 279 8800, ext. 253 San Diego, CA 92123 fax: +1 858 279 8804 USA web: http://www.scitegic.com From cjfields at uiuc.edu Tue Jan 24 17:21:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 24 Jan 2006 16:21:22 -0600 Subject: [Bioperl-l] RemoteBlast.pm and Bio::SearchIO::blast.pm -partially resolved In-Reply-To: <18966F80-B780-4661-953E-613B05B56164@duke.edu> Message-ID: <000301c62134$81cdc500$15327e82@pyrimidine> Jason, I have worked out all the problems with RemoteBlast.pm and posted a patched version to Bugzilla (http://bugzilla.bioperl.org/show_bug.cgi?id=1935). The main problem was that RemoteBlast::save_output was not looking for XML output when dumping from the tempfile to the saved file (it only looked for the text header). That is fixed. The other problems mentioned were due to differences in mapping key=>value pairs between blast and blastxml and a problem in my own script. It passed all tests using 'perl t/RemoteBlast.t' with debugging set. See if anybody else out there can test them out. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at portal.open-bio.org [mailto:bioperl-l- > bounces at portal.open-bio.org] On Behalf Of Jason Stajich > Sent: Tuesday, January 24, 2006 11:16 AM > To: Chris Fields > Cc: bioperl-ml List > Subject: [Bioperl-l] Re: RemoteBlast.pm and Bio::SearchIO::blast.pm - > partially resolved > > Thanks Chris - I don't know when I'll have time to check in bugs so > anyone else who has commit access feel free to give these a whirl and > check in. > > I would propose making the XML default but allowing the text version > to still be supported in the event that someone has setup their own > local NCBI BLAST Web interface which still supports the simple Text > output. > > -j > > On Jan 24, 2006, at 12:09 PM, Chris Fields wrote: > > > I submitted two bugs on Bugzilla to describe recent problems with > > RemoteBlast.pm and SearchIO::blast.pm > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > Today I submitted a patched version of Bio::SearchIO::blast.pm > > which should > > fix the text parsing issue for old (2.2.12) and new (2.2.13) > > versions of > > NCBI's BLAST; the bug link above describes the problem and the > > fix. Problem > > is, I know it will likely break again b/c NCBI will probably change > > text > > output in a future BLAST version. I also agree with Jason about > > changing > > the default for SearchIO to XML. So, does text output parsing through > > blast.pm need to be deprecated in favor of XML, or should both be > > available? > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Tue Jan 24 16:44:34 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 24 Jan 2006 16:44:34 -0500 Subject: [Bioperl-l] new mailing list server Message-ID: <50E14815-266E-4ACB-8E6E-293C9EB33476@duke.edu> Chris Dagdigian has switched our mailing lists over to a new server to upgrade us to newer hardware. In the switch the default mailing list the server name is 'lists.open-bio.org' instead of 'portal.open- bio.org'. That should be the only change you should notice at the bottom of your mails. All mail should get delivered to any of those addresses (although @bioperl.org is preferred). We hope this changeover will help improve the performance and scalability of our mail and webservices. We also will aim to move the developer read-write CVS server to a new machine in the coming weeks. We hope this will only be a minor inconvenience but will allow us to move to a more recent operating system and larger disk space. If you have questions or concerns they can be directed to support AT open-bio.org -jason -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Tue Jan 24 22:31:38 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 24 Jan 2006 22:31:38 -0500 Subject: [Bioperl-l] new website launched Message-ID: <79628361-F026-461F-A156-0B6810DB0B52@duke.edu> I am pleased to announce the release of a new website for BioPerl. The site is based on the mediawiki software that was developed for the wikipedia project. We intend the site to be a place for community input on documentation and design for the BioPerl project. There is also a fair amount of documentation started surrounding bioinformatics tools and techniques applicable to using BioPerl and some of the authors who created these resources. The website continues to be at the URL http://www.bioperl.org. The DNS updates may take up to 24 hours to reach everyone. The initial content of the site is result of the work of myself, Mauricio Herrera Cuadra, Brian Osborne, and Torsten Seemann. We encourage you to contribute to the site's content by signing up for an account. There are several guides for style of the site and how to link to Modules for example which can contain additional information from the POD http://bioperl.org/wiki/Module:Bio::SeqIO You'll notice that many of the paths have changed but the DIST and SRC continues to be available at http://bioperl.org/DIST and http:// bioperl.org/SRC. The HOWTOs are now available from http:// bioperl.org/wiki/HOWTOs The FAQ is available at http://bioperl.org/wiki/FAQ and I encourage you to add your questions to it so they can be properly archived and addressed. We also have initiated a News site for Bioperl for posting announcements regarding development and software. I would like to see if there are volunteers to post weekly or monthly summaries of mailing list traffic and development. http://www.bioperl.org/news/ Jason Stajich on behalf of Mauricio Herrera Cuadra, Brian Osborne, Torsten Seemann. -- Jason Stajich Duke University http://www.duke.edu/~jes12 From roy at colibase.bham.ac.uk Wed Jan 25 12:05:29 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Wed, 25 Jan 2006 17:05:29 +0000 Subject: [Bioperl-l] concatenate two embl sequence files In-Reply-To: <200601182120.k0ILIl8X022324@portal.open-bio.org> References: <200601182120.k0ILIl8X022324@portal.open-bio.org> Message-ID: <43D7AFD9.2020305@colibase.bham.ac.uk> Hi all. I also had need of a function to concatenate two Bio::Seq objects, so had a go at this. My naive attempt (intended to go in Bio::SeqUtils) is pasted below. I'm not too sure about the concept of sub-SeqFeatures (I've never seen any sequence that had more than one level of feature)- I worked on the assumption that little sub-SeqFeatures can have littler sub-SeqFeatures and so ad infinitum, but as I don't have an example file I haven't been able to test if this works. Likewise, although I think the code should cope with Fuzzy and Split locations, I haven't tested this with any particularly unusual examples. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk =head2 cat Title : cat Usage : my $catseq = Bio::SeqUtils->cat(@seqs) Function: Concatenates an array of Bio::Seq objects, using the first sequence as a template for species etc. Adjusts the coordinates of features from any additional objects. Returns : A sequence object of the same class as the first argument. Args : array of sequence objects =cut sub cat { my ($self, @seqs) = @_; my $seq=shift @seqs; $self->throw('Object [$seq] '. 'of class ['. ref($seq). '] should be a Bio::PrimarySeqI ') unless $seq->isa('Bio::PrimarySeqI'); for (@seqs) { $self->throw('Object [$seq] '. 'of class ['. ref($seq). '] should be a Bio::PrimarySeqI ') unless $seq->isa('Bio::PrimarySeqI'); my $length=$seq->length; $seq->seq($seq->seq.$_->seq); for my $feat ($_->get_SeqFeatures) { $seq->add_SeqFeature($self->_coordAdjust($feat, $length)); } } return $seq; } =head2 _coordAdjust Title : _coordAdjust Usage : my $newfeat=Bio::SeqUtils->_coordAdjust($feature, 100); Function: Recursive subroutine to adjust the coordinates of a feature and all its subfeatures. Returns : A Bio::SeqFeatureI compliant object. Args : A Bio::SeqFeatureI compliant object, the number of bases to add to the coordinates =cut sub _coordAdjust { my ($self, $feat, $add)=@_; $self->throw('Object [$feat] '. 'of class ['. ref($feat). '] should be a Bio::SeqFeatureI ') unless $feat->isa('Bio::SeqFeatureI'); my @adjsubfeat; for my $subfeat ($feat->remove_SeqFeatures) { push @adjsubfeat, Bio::SeqUtils->_coordAdjust($add, $subfeat); } my @loc=$feat->location->each_Location; map { my @coords=($_->start, $_->end); map s/(\d+)/$add+$1/ge, @coords; $_->start(shift @coords); $_->end(shift @coords); } @loc; if (@loc==1) { $feat->location($loc[0]) } else { my $loc=Bio::Location::Split->new; $loc->add_sub_Location(@loc); $feat->location($loc); } $feat->add_SeqFeature($_) for @adjsubfeat; return $feat; } > > > Jan, > > It would be easy if someone had written a function to do it. Even writing the > function is not hard. I do not think there is no other way than go through > all features, though. > > In my opinion this would be an excellent addition to Bio::Seq::Utilities. > > E.g. cat($arrayrefofsequences, optional_seq_class_to_create) > return a new seq, species and other info based on the first seq in array > > Could you write it and post to bugzilla? > > -Heikki > > > On Tuesday 17 January 2006 11:54, jan aerts (RI) wrote: >> Hi all, >> >> Does anyone know of an easy way to concatenate two sequences, including >> recalculation of features positions of the second one? E.g. >> seq 1 = 100 bp >> feature A: 5..15 >> seq 2 = 200 bp >> feature B: 20..30 >> => concatenated sequence 3 = 300 bp >> feature A: 5..15 >> feature B: 120..130 <<<<<<<<<<< >> >> Annotations (features without range) should be transferred as well. >> >> Of course, it must be possible to create a blank sequence and work my >> way through all features, adding them to a new collection of features >> and stuff. But I was wondering if a simpler technique is possible. >> >> Many thanks, >> Jan Aerts >> Bioinformatics Department >> Roslin Institute >> Roslin, Scotland, UK >> >> ---------The obligatory disclaimer-------- >> The information contained in this e-mail (including any attachments) is >> confidential and is intended for the use of the addressee only. The >> opinions expressed within this e-mail (including any attachments) are >> the opinions of the sender and do not necessarily constitute those of >> Roslin Institute (Edinburgh) ("the Institute") unless specifically >> stated by a sender who is duly authorised to do so on behalf of the >> Institute. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > From heikki at sanbi.ac.za Wed Jan 25 16:11:45 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 25 Jan 2006 23:11:45 +0200 Subject: [Bioperl-l] concatenate two embl sequence files In-Reply-To: <43D7AFD9.2020305@colibase.bham.ac.uk> References: <200601182120.k0ILIl8X022324@portal.open-bio.org> <43D7AFD9.2020305@colibase.bham.ac.uk> Message-ID: <200601252311.45582.heikki@sanbi.ac.za> Thanks Roy! I'll check to code in tomorrow when I am less sleepy and can go through the code in detail. In principle the code looks good. It definitely needs tests. If you have written any please do post them. A few more checks to make sure seq_>alphabet is the same in all sequences might be a good idea. -Heikki On Wednesday 25 January 2006 19:05, Roy Chaudhuri wrote: > Hi all. > > I also had need of a function to concatenate two Bio::Seq objects, so had a > go at this. My naive attempt (intended to go in Bio::SeqUtils) is pasted > below. I'm not too sure about the concept of sub-SeqFeatures (I've never > seen any sequence that had more than one level of feature)- I worked on the > assumption that little sub-SeqFeatures can have littler sub-SeqFeatures and > so ad infinitum, but as I don't have an example file I haven't been able to > test if this works. Likewise, although I think the code should cope with > Fuzzy and Split locations, I haven't tested this with any particularly > unusual examples. > > Roy. > -- > Dr. Roy Chaudhuri > Bioinformatics Research Fellow > Division of Immunity and Infection > University of Birmingham, U.K. > > http://xbase.bham.ac.uk > > > > =head2 cat > > Title : cat > Usage : my $catseq = Bio::SeqUtils->cat(@seqs) > Function: Concatenates an array of Bio::Seq objects, using the first > sequence as a template for species etc. Adjusts the coordinates of features > from any additional objects. > Returns : A sequence object of the same class as the first argument. > Args : array of sequence objects > > > =cut > > sub cat { > my ($self, @seqs) = @_; > my $seq=shift @seqs; > $self->throw('Object [$seq] '. 'of class ['. ref($seq). > '] should be a Bio::PrimarySeqI ') > unless $seq->isa('Bio::PrimarySeqI'); > for (@seqs) { > $self->throw('Object [$seq] '. 'of class ['. ref($seq). > '] should be a Bio::PrimarySeqI ') > unless $seq->isa('Bio::PrimarySeqI'); > my $length=$seq->length; > $seq->seq($seq->seq.$_->seq); > for my $feat ($_->get_SeqFeatures) { > $seq->add_SeqFeature($self->_coordAdjust($feat, $length)); > } > } > return $seq; > } > > =head2 _coordAdjust > > Title : _coordAdjust > Usage : my $newfeat=Bio::SeqUtils->_coordAdjust($feature, 100); > Function: Recursive subroutine to adjust the coordinates of a feature > and all its subfeatures. > Returns : A Bio::SeqFeatureI compliant object. > Args : A Bio::SeqFeatureI compliant object, > the number of bases to add to the coordinates > > > =cut > > sub _coordAdjust { > my ($self, $feat, $add)=@_; > $self->throw('Object [$feat] '. 'of class ['. ref($feat). > '] should be a Bio::SeqFeatureI ') > unless $feat->isa('Bio::SeqFeatureI'); > my @adjsubfeat; > for my $subfeat ($feat->remove_SeqFeatures) { > push @adjsubfeat, Bio::SeqUtils->_coordAdjust($add, $subfeat); > } > my @loc=$feat->location->each_Location; > map { > my @coords=($_->start, $_->end); > map s/(\d+)/$add+$1/ge, @coords; > $_->start(shift @coords); > $_->end(shift @coords); > } @loc; > if (@loc==1) { > $feat->location($loc[0]) > } else { > my $loc=Bio::Location::Split->new; > $loc->add_sub_Location(@loc); > $feat->location($loc); > } > $feat->add_SeqFeature($_) for @adjsubfeat; > return $feat; > } > > > Jan, > > > > It would be easy if someone had written a function to do it. Even writing > > the function is not hard. I do not think there is no other way than go > > through all features, though. > > > > In my opinion this would be an excellent addition to Bio::Seq::Utilities. > > > > E.g. cat($arrayrefofsequences, optional_seq_class_to_create) > > return a new seq, species and other info based on the first seq in > > array > > > > Could you write it and post to bugzilla? > > > > -Heikki > > > > On Tuesday 17 January 2006 11:54, jan aerts (RI) wrote: > >> Hi all, > >> > >> Does anyone know of an easy way to concatenate two sequences, including > >> recalculation of features positions of the second one? E.g. > >> seq 1 = 100 bp > >> feature A: 5..15 > >> seq 2 = 200 bp > >> feature B: 20..30 > >> => concatenated sequence 3 = 300 bp > >> feature A: 5..15 > >> feature B: 120..130 <<<<<<<<<<< > >> > >> Annotations (features without range) should be transferred as well. > >> > >> Of course, it must be possible to create a blank sequence and work my > >> way through all features, adding them to a new collection of features > >> and stuff. But I was wondering if a simpler technique is possible. > >> > >> Many thanks, > >> Jan Aerts > >> Bioinformatics Department > >> Roslin Institute > >> Roslin, Scotland, UK > >> > >> ---------The obligatory disclaimer-------- > >> The information contained in this e-mail (including any attachments) is > >> confidential and is intended for the use of the addressee only. The > >> opinions expressed within this e-mail (including any attachments) are > >> the opinions of the sender and do not necessarily constitute those of > >> Roslin Institute (Edinburgh) ("the Institute") unless specifically > >> stated by a sender who is duly authorised to do so on behalf of the > >> Institute. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Wed Jan 25 15:52:42 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 25 Jan 2006 22:52:42 +0200 Subject: [Bioperl-l] new website launched In-Reply-To: <79628361-F026-461F-A156-0B6810DB0B52@duke.edu> References: <79628361-F026-461F-A156-0B6810DB0B52@duke.edu> Message-ID: <200601252252.42786.heikki@sanbi.ac.za> Congratulations and huge thank you for the production team! The new website is a big step ahead readability and ease in editing the information. I for my part have already corrected a few small typos and omissions on the new pages. I invite other to do the same. -Heikki On Wednesday 25 January 2006 05:31, Jason Stajich wrote: > I am pleased to announce the release of a new website for BioPerl. > The site is based on the mediawiki software that was developed for > the wikipedia project. We intend the site to be a place for > community input on documentation and design for the BioPerl project. > There is also a fair amount of documentation started surrounding > bioinformatics tools and techniques applicable to using BioPerl and > some of the authors who created these resources. > > The website continues to be at the URL http://www.bioperl.org. The > DNS updates may take up to 24 hours to reach everyone. > > The initial content of the site is result of the work of myself, > Mauricio Herrera Cuadra, Brian Osborne, and Torsten Seemann. We > encourage you to contribute to the site's content by signing up for > an account. > > There are several guides for style of the site and how to link to > Modules for example which can contain additional information from the > POD > http://bioperl.org/wiki/Module:Bio::SeqIO > > You'll notice that many of the paths have changed but the DIST and > SRC continues to be available at http://bioperl.org/DIST and http:// > bioperl.org/SRC. The HOWTOs are now available from http:// > bioperl.org/wiki/HOWTOs > > The FAQ is available at http://bioperl.org/wiki/FAQ and I encourage > you to add your questions to it so they can be properly archived and > addressed. > > We also have initiated a News site for Bioperl for posting > announcements regarding development and software. I would like to > see if there are volunteers to post weekly or monthly summaries of > mailing list traffic and development. > http://www.bioperl.org/news/ > > > Jason Stajich on behalf of Mauricio Herrera Cuadra, Brian Osborne, > Torsten Seemann. > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From cjfields at uiuc.edu Wed Jan 25 22:34:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 25 Jan 2006 21:34:01 -0600 Subject: [Bioperl-l] [Gmod-gbrowse] GMOD PPM repository not working In-Reply-To: <1138119383.3338.68.camel@localhost.localdomain> Message-ID: <000201c62229$59ed5f50$15327e82@pyrimidine> Scott, This popped up, for some reason, when I tried to install a perl module (Error.pm); maybe it has something to do with the reason PPM can't 'see' GMOD's repository. It crashes PPM pretty nicely! Looks like the home page for GMOD, so maybe Sourceforge is redirecting things and this messes with PPM? _____________________________________________ C:\Perl\Scripts>ppm PPM - Programmer's Package Manager version 3.3. Copyright (c) 2001 ActiveState Corp. All Rights Reserved. ActiveState is a division of Sophos. Entering interactive shell. Using Term::ReadLine::Perl as readline library. Type 'help' to get started. ppm> rep Repositories: [1] Bioperl [2] gmod [3] ActiveState PPM2 Repository [4] ActiveState Package Repository [ ] Bribes [ ] Kobes [ ] local ppm> install Error PPM::PPD::init: not a PPD and not a file: The Generic Model Organism Database Project | GMOD