From dmessina at wustl.edu Sun Apr 1 22:54:58 2007 From: dmessina at wustl.edu (David Messina) Date: Sun, 1 Apr 2007 21:54:58 -0500 Subject: [Bioperl-l] installation bioperl Message-ID: <6EFFF13A-66E7-418F-8B8E-A8AA8826DE83@wustl.edu> We need more information to be able to help you. Could you please show us the actual output you see when trying to install Bioperl? Also, we need to know: - what operating system you have - what version of Bioperl you are trying to install See http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance and please read the rest of the document, too. Dave From aharry2001 at yahoo.com Mon Apr 2 06:09:25 2007 From: aharry2001 at yahoo.com (Ambrose) Date: Mon, 2 Apr 2007 03:09:25 -0700 (PDT) Subject: [Bioperl-l] bioperl and kegg(out of memory problem ) In-Reply-To: Message-ID: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com> Hello All, I have some problems parsing KEGG using bioperl. I get out of memory problem.I current have 1G RAM.Can some tell me why this is happening and how it can be solved.It is beacuse the objects passed to bioiperl are so big or what? best regrads Ambrose --------------------------------- TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV. From cjfields at uiuc.edu Mon Apr 2 08:43:18 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Apr 2007 07:43:18 -0500 Subject: [Bioperl-l] bioperl and kegg(out of memory problem ) In-Reply-To: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com> References: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com> Message-ID: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu> This doesn't really explain much beyond stating you are having problems. You need to post some code (to the mail list!) and let us know what version of BioPerl you are using. chris On Apr 2, 2007, at 5:09 AM, Ambrose wrote: > Hello All, > I have some problems parsing KEGG using bioperl. I get > out of memory problem.I current have 1G RAM.Can some tell me why > this is happening and how it can be solved.It is beacuse the > objects passed to bioiperl are so big or what? > > best regrads > Ambrose > > > --------------------------------- > TV dinner still cooling? > Check out "Tonight's Picks" on Yahoo! TV. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From aharry2001 at yahoo.com Mon Apr 2 09:56:33 2007 From: aharry2001 at yahoo.com (Ambrose) Date: Mon, 2 Apr 2007 06:56:33 -0700 (PDT) Subject: [Bioperl-l] bioperl and kegg(out of memory problem ) In-Reply-To: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu> Message-ID: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com> Hello ALL, I have the code below,which parses my kegg files.A host of the files are parsed and the information is inserted into my databases but unfortunate after the program runs for some hours it stops showing the message out of memory.I assume that this happens because the bioperl object is too big.Please just check the code below best regards Ambrose #!/usr/local/ActivePerl/bin/perl # # use strict; use Bio::SeqIO; use Bio::FASTASequence; use DBI; use Benchmark qw(:all) ; my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1,$pathway_id1,$pathway_name1,$ec_num); my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,%dblink_NCBIGENEID,%dblink_UniProt); my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq); my( @kg_id); my $db="gbdb"; my $host="localhost"; my $userid="root"; my $passwd="ubuntu"; my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/mysqld.sock"; my ($t1,$t2); my $dbh = DBI->connect($connectionInfo,$userid,$passwd); my $time_used; eval { $dbh->do("DROP TABLE kegginfo") }; print "Dropping kegginfo failed: $@\n" if $@; $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL AUTO_INCREMENT, up_id INT UNSIGNED REFERENCES uniprotinfo(up_id), filename VARCHAR(50), kegg_id VARCHAR(50), keggaccn VARCHAR(50), description VARCHAR(250), ec_numbers VARCHAR(250), pathway_id VARCHAR(250), pathway_name VARCHAR(250), crc64 VARCHAR(50), ko_id VARCHAR(50), pfam_id VARCHAR(50), ncbigi_id VARCHAR(50), ncbigeneid_id VARCHAR(50), uniprot_id VARCHAR(50), prosite_id VARCHAR(50), PRIMARY KEY (kg_id) )"); eval { $dbh->do("DROP TABLE keggntsequence") }; print "Dropping keggntsequence failed: $@\n" if $@; $dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id), keggaccn VARCHAR(50), nucleotidesequence text )"); eval { $dbh->do("DROP TABLE keggaasequence") }; print "Dropping keggaasequence failed: $@\n" if $@; $dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id), keggaccn VARCHAR(50), crc64 VARCHAR(50), aminoacidsequence text )"); eval { $dbh->do("DROP TABLE timestable") }; print "Dropping timestable failed: $@\n" if $@; $dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT NULL AUTO_INCREMENT, genome VARCHAR(100), totaltime_seconds int(100), PRIMARY KEY(aut_id))"); open (LIST, "genomes.list") || die "Cannot open input kegg genomes file genomes.list\n $! \n"; $t1=new Benchmark; my @genomelist = (); while (my $line=) { #ignore comment lines if ($line !~ /^#/) { chomp $line; push (@genomelist, $line); #store the filename } } close LIST; my $count=0; foreach my $genomefile (@genomelist) { #in case the user fails to remove some strange files from #the genomes.list file.. check for the KEGG format my $check=checkKeggFormat($genomefile); if ($check==0) { #if file is not kegg, start with the next one... print "ERROR: $genomefile doesn't look like a KEGG file to me! \n"; #; next; } #print $genomefile,"\n"; my $stream = Bio::SeqIO->new(-file => $genomefile, -format => 'KEGG'); while ( my $seq = $stream->next_seq() ) { my $primary_id = $seq->primary_id; my $display_id = $seq->display_id; #name my $keggaccn = $seq->accession; #accn my @description = $seq->annotation->get_Annotations('description'); my @dblinks = $seq->annotation->get_Annotations('dblink'); my @orthologs = $seq->annotation->get_Annotations('ortholog'); my @orthologs = grep {$_->database eq 'KO'} $seq->annotation->get_Annotations('dblink'); my @class = $seq->annotation->get_Annotations('pathway'); $ntseq{$keggaccn} = $seq->seq; $aaseq{$keggaccn} = $seq->translate->seq; $aaseq{$keggaccn} =~s /\*$//; my $fasta = ">".$count."\n".$aaseq{$keggaccn}; my $newseq = Bio::FASTASequence->new($fasta); $crc64{$keggaccn}=$newseq->getCrc64(); #print $keggaccn,"crc64:$crc64{$keggaccn}\n"; $count++; if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no keggaccn\n"; next;} if(@dblinks) { my @dblink_KO=(); my @dblink_Pfam=(); my @dblink_PROSITE=(); my @dblink_NCBIGI=(); my @dblink_NCBIGENEID=(); my @dblink_UniProt=(); foreach my $ele (@dblinks) { if ($ele =~ /^KO:/){ $ele=~s/KO://; push (@dblink_KO,$ele); $dblink_KO{$keggaccn}=$ele; next; } #parse Pfam: dblink if ($ele =~ /^Pfam:/){ $ele=~s/Pfam://; push (@dblink_Pfam,$ele); $dblink_Pfam{$keggaccn}=$ele; next; } #parse PROSITE: dblink if ($ele =~ /^PROSITE:/){ $ele=~s/PROSITE://; push (@dblink_PROSITE,$ele); $dblink_PROSITE{$keggaccn}=$ele; next; } #parse NCBI-GI: dblink if ($ele =~ /^NCBI-GI:/){ $ele=~s/NCBI-GI://; push (@dblink_NCBIGI,$ele); $dblink_NCBIGI{$keggaccn}=$ele; next; } #parse NCBI-GeneID: dblink if ($ele =~ /^NCBI-GeneID:/){ $ele=~s/NCBI-GeneID://; push (@dblink_NCBIGENEID,$ele); $dblink_NCBIGENEID{$keggaccn}=$ele; next; } #parse UniProt: dblink if ($ele =~ /^UniProt:/){ $ele=~s/UniProt://; push (@dblink_UniProt,$ele); $dblink_UniProt{$keggaccn}=$ele; next; } }#end foreach #finished parsing all dblinks }#end if @dblinks if(@class) { foreach my $pathway (@class) { $pathway=~s/^\s+|\s+$//; my @arr = split (/\s+/,$pathway); my $pathway_id = $arr[0]; shift @arr; my $pathway_name = join(" ", at arr); $pathway_name{$keggaccn}=$pathway_name; $pathway_id{$keggaccn}=$pathway_id; #print $pathway_id{$keggaccn},"\t",$pathway_name{$keggaccn},"\n"; } } my @ecnumbers=(); @ecnumbers = extractECnumbers(@description); if(@ecnumbers) { if (@ecnumbers!=0) { foreach my $ecn (@ecnumbers) { $ecnumbers{$keggaccn}=$ecn; }#end foreach } else { #print "ECnumbers:\n"; } } # print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t",$dblink_NCBIGENEID{$keggaccn}, # "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers{$keggaccn}","\t", # "p1:$pathway_id{$keggaccn}","\t","p2:$pathway_name{$keggaccn}","\n"; # $dbh->do("INSERT INTO kegginfo VALUES (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)", undef,"NULL","NULL",$genomefile,$display_id,$keggaccn, at description,$ecnumbers{$keggaccn}, $pathway_id{$keggaccn},$pathway_name{$keggaccn},$crc64{$keggaccn},$dblink_KO{$keggaccn}, $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn},$dblink_NCBIGENEID{$keggaccn}, $dblink_UniProt{$keggaccn},$dblink_PROSITE{$keggaccn}); $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)", undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn}); $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)", undef,"",$keggaccn,$ntseq{$keggaccn}); } $t2=new Benchmark; $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile"); $dbh->do("INSERT INTO timestable VALUES (?,?,?)", undef,"NULL",$genomefile,$time_used); } $dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)"); print "Index created on kegginfo\n"; $dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence (kg_id,keggaccn)"); print "Index created on keggaasequence\n"; $dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence (kg_id,keggaccn)"); print "Index created on keggntsequence\n"; print"Updating the tables................\n"; $dbh->do("update kegginfo,keggaasequence set keggaasequence.kg_id=kegginfo.kg_id where kegginfo.keggaccn=keggaasequence.keggaccn"); print " keggaasequence kg_id\n"; $dbh->do("update kegginfo,keggntsequence set keggntsequence.kg_id=kegginfo.kg_id where kegginfo.keggaccn=keggntsequence.keggaccn"); print " keggaasequence kg_id\n"; sub extractECnumbers ($) { #sample description lines #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 2.7.7.2] #ATP synthase F0 subunit c [EC:3.6.3.14] my @desc=shift; my $description = join ("", at desc); my @ecnumbers=(); #print "parsing ec for $description..\n"; #check if EC number exists if ($description=~/\[EC:/) { my @array = split (/\[EC:/,$description); $array[1]=~s/]//g; shift @array; #remove the annotation , only EC numbers remain foreach my $ele (@array) { $ele=~s/^\s+|\s+$//g; $ele= "EC:".$ele; push (@ecnumbers,$ele); } return @ecnumbers; } else { #return an empty value return ; } } sub checkKeggFormat ($) { =head2 checkKeggFormat make sure that the file is a valid KEGG file function checks the first two lines, 1st must start with ENTRY 2nd must start with DEFINITION returns 0 or 1 =cut my $genomefile=shift; open (TEST,$genomefile) || die "Cannot open file $genomefile for reading \n"; my $testline=; #print "$testline\n"; if ($testline=~/^ENTRY/) { #continue #$testline=;#double check #if ($testline=~/^NAME/) { #this looks like a valid kegg file return 1; #} #else { # close TEST; # return 0; #} } else { close TEST; return 0; } } sub timeThis ($$$) { my ($start,$end,$message) = @_; my $td = timediff($end, $start); my $t = timestr($td); print "$message : ",$t,"\n"; my @array = split (/\s+/,$t); #20 wallclock secs (14.23 usr + 0.84 sys = 15.07 CPU) return $array[0]; #return the no. of seconds. } --------------------------------- Looking for earth-friendly autos? Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. From e-just at northwestern.edu Mon Apr 2 10:12:33 2007 From: e-just at northwestern.edu (Eric Just) Date: Mon, 2 Apr 2007 09:12:33 -0500 Subject: [Bioperl-l] Can't locate object method "seq_start" via package "Bio::DB::GenBank" Message-ID: Hello, I am getting this error while running a bioperl script that I had been using in bioperl 1.4. On upgradeing to bioperl 1.5.2 I get the following fatal error Can't locate object method "seq_start" via package "Bio::DB::GenBank" My script is as follows: use Bio::DB::GenBank; use Bio::DB::Query::GenBank; my $gb = new Bio::DB::GenBank(); my $query = Bio::DB::Query::GenBank->new( -query =>'txid44689[Organism:noexp]', -reldate => 60, -db => 'nucleotide' ); my $in = $gb->get_Stream_by_query($query); while ( my $seq = $in->next_seq()) { print "do something"; #.... } I noticed that seq_start is created in the begin block of Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have expericence troubleshooting this kind of autoloaded method. Any idea where to start? Thanks Eric From e-just at northwestern.edu Mon Apr 2 10:15:28 2007 From: e-just at northwestern.edu (Eric Just) Date: Mon, 2 Apr 2007 09:15:28 -0500 Subject: [Bioperl-l] Can't locate object method "seq_start" via package "Bio::DB::GenBank" In-Reply-To: References: Message-ID: Sorry about that. As soon as I sent the email I found my problem ( an old NCBIHelper in my inheritance path ). There is no bug here. Eric On 4/2/07, Eric Just wrote: > > Hello, > > I am getting this error while running a bioperl script that I had been > using in bioperl 1.4. On upgradeing to bioperl 1.5.2 I get the following > fatal error > > Can't locate object method "seq_start" via package "Bio::DB::GenBank" > > My script is as follows: > > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > my $gb = new Bio::DB::GenBank(); > > my $query = Bio::DB::Query::GenBank->new( > -query =>'txid44689[Organism:noexp]', > -reldate => 60, > -db => 'nucleotide' > > ); > > my $in = $gb->get_Stream_by_query($query); > > while ( my $seq = $in->next_seq()) { > print "do something"; > #.... > } > > > > I noticed that seq_start is created in the begin block of > Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have > expericence troubleshooting this kind of autoloaded method. Any idea where > to start? > > Thanks > > Eric > From cjfields at uiuc.edu Mon Apr 2 11:32:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Apr 2007 10:32:59 -0500 Subject: [Bioperl-l] bioperl and kegg(out of memory problem ) In-Reply-To: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com> References: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com> Message-ID: <38475C93-FB21-4BC4-BF5D-7F48493E8EE2@uiuc.edu> Ambrose, Data is persisting in your hashes (in particular DBLink objects), which is eating away at your memory. If I take a sample KEGG gene file and simply parse it: while (my $seq = $io->next_seq) { print $seq->accession,"\n"; } there are no memory issues, but if I store the data in hashes declared outside the loop: my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% dblink_NCBIGENEID,%dblink_UniProt); my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq); while (my $seq = $io->next_seq) { # store Bio::Seq data in hashes } I see problems with only one genome file with KEGG records. You'll definitely run into memory issues if you are parsing many genome files, which you appear to be: my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% dblink_NCBIGENEID,%dblink_UniProt); my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq); for my $genomefile (@genomelist) { while (my $seq = $io->next_seq) { # store Bio::Seq data in hashes } } Localizing the hashes to the genome or sequence loops should prevent the memory problem. Note that the DBLink Annotation objects are overloaded so they act like a string ($ele =~ /^KO:/) but are actually Bio::Annotation::DBLink objects, something we will likely get rid of in the near future. chris On Apr 2, 2007, at 8:56 AM, Ambrose wrote: > > > Hello ALL, > > I have the code below,which parses my kegg files.A host of the > files are parsed and the information is inserted into my databases > but unfortunate after the program runs for some hours it stops > showing the message out of memory.I assume that this happens > because the bioperl object is too big.Please just check the code below > > best regards Ambrose > > > #!/usr/local/ActivePerl/bin/perl > # > # > > use strict; > use Bio::SeqIO; > use Bio::FASTASequence; > use DBI; > use Benchmark qw(:all) ; > > my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1, > $pathway_id1,$pathway_name1,$ec_num); > my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% > dblink_NCBIGENEID,%dblink_UniProt); > my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq); > my( @kg_id); > my $db="gbdb"; > my $host="localhost"; > my $userid="root"; > my $passwd="ubuntu"; > my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/ > mysqld.sock"; > my ($t1,$t2); > my $dbh = DBI->connect($connectionInfo,$userid,$passwd); > my $time_used; > > > > eval { $dbh->do("DROP TABLE kegginfo") }; > print "Dropping kegginfo failed: $@\n" if $@; > $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL > AUTO_INCREMENT, > up_id INT UNSIGNED REFERENCES > uniprotinfo(up_id), > > filename VARCHAR(50), > kegg_id VARCHAR > (50), > keggaccn VARCHAR(50), > > description VARCHAR(250), > ec_numbers VARCHAR(250), > pathway_id VARCHAR(250), > pathway_name VARCHAR > (250), > crc64 VARCHAR(50), > ko_id VARCHAR(50), > pfam_id VARCHAR(50), > ncbigi_id VARCHAR(50), > ncbigeneid_id VARCHAR(50), > uniprot_id VARCHAR(50), > prosite_id VARCHAR(50), > PRIMARY KEY (kg_id) > )"); > > > eval { $dbh->do("DROP TABLE keggntsequence") }; > print "Dropping keggntsequence failed: $@\n" if $@; > $dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED > REFERENCES uniprotinfo(kg_id), > keggaccn VARCHAR > (50), > nucleotidesequence text > )"); > > eval { $dbh->do("DROP TABLE keggaasequence") }; > print "Dropping keggaasequence failed: $@\n" if $@; > $dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED > REFERENCES uniprotinfo(kg_id), > keggaccn VARCHAR > (50), > crc64 VARCHAR(50), > aminoacidsequence text > )"); > eval { $dbh->do("DROP TABLE timestable") }; > print "Dropping timestable failed: $@\n" if $@; > $dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT > NULL AUTO_INCREMENT, > genome VARCHAR(100), > totaltime_seconds int(100), > > PRIMARY KEY(aut_id))"); > > > > open (LIST, "genomes.list") || die "Cannot open input kegg genomes > file genomes.list\n $! \n"; > $t1=new Benchmark; > my @genomelist = (); > while (my $line=) { > #ignore comment lines > if ($line !~ /^#/) { > chomp $line; > > push (@genomelist, $line); #store the filename > } > } > > close LIST; > my $count=0; > foreach my $genomefile (@genomelist) { > > #in case the user fails to remove some strange files from > #the genomes.list file.. check for the KEGG format > my $check=checkKeggFormat($genomefile); > if ($check==0) { > #if file is not kegg, start with the next one... > print "ERROR: $genomefile doesn't look like a KEGG file to > me! \n"; > #; > next; > } > #print $genomefile,"\n"; > my $stream = Bio::SeqIO->new(-file => $genomefile, -format => > 'KEGG'); > > while ( my $seq = $stream->next_seq() ) { > > my $primary_id = $seq->primary_id; > my $display_id = $seq->display_id; #name > my $keggaccn = $seq->accession; #accn > my @description = $seq->annotation->get_Annotations > ('description'); > > my @dblinks = $seq->annotation->get_Annotations('dblink'); > my @orthologs = $seq->annotation->get_Annotations > ('ortholog'); > my @orthologs = grep {$_->database eq 'KO'} $seq- > >annotation->get_Annotations('dblink'); > my @class = $seq->annotation->get_Annotations > ('pathway'); > $ntseq{$keggaccn} = $seq->seq; > $aaseq{$keggaccn} = $seq->translate->seq; > $aaseq{$keggaccn} =~s /\*$//; > my $fasta = ">".$count."\n".$aaseq{$keggaccn}; > my $newseq = Bio::FASTASequence->new($fasta); > $crc64{$keggaccn}=$newseq->getCrc64(); > #print $keggaccn,"crc64:$crc64{$keggaccn}\n"; > > $count++; > if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no > keggaccn\n"; > next;} > > if(@dblinks) > { > my @dblink_KO=(); > my @dblink_Pfam=(); > my @dblink_PROSITE=(); > my @dblink_NCBIGI=(); > my @dblink_NCBIGENEID=(); > my @dblink_UniProt=(); > > foreach my $ele (@dblinks) { > if ($ele =~ /^KO:/){ > $ele=~s/KO://; > push (@dblink_KO,$ele); > $dblink_KO{$keggaccn}=$ele; > next; > } > #parse Pfam: dblink > if ($ele =~ /^Pfam:/){ > $ele=~s/Pfam://; > push (@dblink_Pfam,$ele); > $dblink_Pfam{$keggaccn}=$ele; > next; > } > #parse PROSITE: dblink > if ($ele =~ /^PROSITE:/){ > $ele=~s/PROSITE://; > push (@dblink_PROSITE,$ele); > $dblink_PROSITE{$keggaccn}=$ele; > next; > } > #parse NCBI-GI: dblink > if ($ele =~ /^NCBI-GI:/){ > $ele=~s/NCBI-GI://; > push (@dblink_NCBIGI,$ele); > $dblink_NCBIGI{$keggaccn}=$ele; > next; > } > #parse NCBI-GeneID: dblink > if ($ele =~ /^NCBI-GeneID:/){ > $ele=~s/NCBI-GeneID://; > push (@dblink_NCBIGENEID,$ele); > $dblink_NCBIGENEID{$keggaccn}=$ele; > next; > } > #parse UniProt: dblink > if ($ele =~ /^UniProt:/){ > $ele=~s/UniProt://; > push (@dblink_UniProt,$ele); > $dblink_UniProt{$keggaccn}=$ele; > next; > } > > }#end foreach #finished parsing all dblinks > }#end if @dblinks > if(@class) > { > foreach my $pathway (@class) { > > $pathway=~s/^\s+|\s+$//; > my @arr = split (/\s+/,$pathway); > my $pathway_id = $arr[0]; > shift @arr; > my $pathway_name = join(" ", at arr); > $pathway_name{$keggaccn}=$pathway_name; > $pathway_id{$keggaccn}=$pathway_id; > #print $pathway_id{$keggaccn},"\t",$pathway_name > {$keggaccn},"\n"; > > } > > } > > my @ecnumbers=(); > @ecnumbers = extractECnumbers(@description); > if(@ecnumbers) > { > if (@ecnumbers!=0) > { > foreach my $ecn (@ecnumbers) > { > $ecnumbers{$keggaccn}=$ecn; > }#end foreach > } > else { > #print "ECnumbers:\n"; > } > } > > > # print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t", > $dblink_NCBIGENEID{$keggaccn}, > # "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers > {$keggaccn}","\t", > # "p1:$pathway_id{$keggaccn}","\t","p2: > $pathway_name{$keggaccn}","\n"; > # > $dbh->do("INSERT INTO kegginfo VALUES > (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)", > undef,"NULL","NULL",$genomefile,$display_id, > $keggaccn, at description,$ecnumbers{$keggaccn}, > $pathway_id{$keggaccn},$pathway_name{$keggaccn}, > $crc64{$keggaccn},$dblink_KO{$keggaccn}, > $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn}, > $dblink_NCBIGENEID{$keggaccn}, > $dblink_UniProt{$keggaccn},$dblink_PROSITE > {$keggaccn}); > > > $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)", > undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn}); > > > $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)", > undef,"",$keggaccn,$ntseq{$keggaccn}); > > > } > $t2=new Benchmark; > $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile"); > $dbh->do("INSERT INTO timestable VALUES (?,?,?)", > undef,"NULL",$genomefile,$time_used); > > } > > > $dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)"); > print "Index created on kegginfo\n"; > > $dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence > (kg_id,keggaccn)"); > print "Index created on keggaasequence\n"; > > $dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence > (kg_id,keggaccn)"); > print "Index created on keggntsequence\n"; > > > print"Updating the tables................\n"; > > > $dbh->do("update kegginfo,keggaasequence set > keggaasequence.kg_id=kegginfo.kg_id > where > kegginfo.keggaccn=keggaasequence.keggaccn"); > print " keggaasequence kg_id\n"; > > $dbh->do("update kegginfo,keggntsequence set > keggntsequence.kg_id=kegginfo.kg_id > where > kegginfo.keggaccn=keggntsequence.keggaccn"); > print " keggaasequence kg_id\n"; > > > > sub extractECnumbers ($) { > #sample description lines > #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 > 2.7.7.2] > #ATP synthase F0 subunit c [EC:3.6.3.14] > > my @desc=shift; > my $description = join ("", at desc); > my @ecnumbers=(); > #print "parsing ec for $description..\n"; > #check if EC number exists > if ($description=~/\[EC:/) { > > my @array = split (/\[EC:/,$description); > $array[1]=~s/]//g; > shift @array; #remove the annotation , only EC numbers remain > foreach my $ele (@array) { > $ele=~s/^\s+|\s+$//g; > $ele= "EC:".$ele; > push (@ecnumbers,$ele); > } > return @ecnumbers; > } > else { > #return an empty value > return ; > > } > > } > > > > > > > > sub checkKeggFormat ($) { > =head2 > > checkKeggFormat > > make sure that the file is a valid KEGG file > function checks the first two lines, > 1st must start with ENTRY > 2nd must start with DEFINITION > > returns 0 or 1 > > =cut > my $genomefile=shift; > > open (TEST,$genomefile) || die "Cannot open file $genomefile > for reading \n"; > my $testline=; > #print "$testline\n"; > if ($testline=~/^ENTRY/) { > #continue > #$testline=;#double check > #if ($testline=~/^NAME/) { > #this looks like a valid kegg file > return 1; > #} > #else { > # close TEST; > # return 0; > #} > } > else { > close TEST; > return 0; > } > > } > > sub timeThis ($$$) > { > my ($start,$end,$message) = @_; > my $td = timediff($end, $start); > my $t = timestr($td); > print "$message : ",$t,"\n"; > my @array = split (/\s+/,$t); > #20 wallclock secs (14.23 usr + 0.84 sys = 15.07 CPU) > return $array[0]; #return the no. of seconds. > } > > > > > --------------------------------- > Looking for earth-friendly autos? > Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Mon Apr 2 12:19:51 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 2 Apr 2007 11:19:51 -0500 Subject: [Bioperl-l] installation bioperl Message-ID: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu> Hi Fahmi, Please include the list on the reply so that others can comment, too. Yes, it appears the machine you are installing on does not have an internet connection. You probably will want to resolve that problem before dealing with Bioperl. Alternatively, you could simply install and use Bioperl on the machine which does have an internet connection. If you really need to get Bioperl installed on that machine, however, probably the easiest way would be to find a machine that does have an internet connection, install CPAN::Mini, and use it to make a local mirror of CPAN. You could then copy that local mirror over to the machine without the internet connection and point that machine's cpan at the local mirror (read the CPAN documentation to find out how to do this). Also, the BioPerl install instructions list several external packages that you will need to use some parts of Bioperl (e.g. GD). Again, you can download those distributions using the machine with the internet connection and copy them over. Dave On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote: > thank you for answer. I will give you the maximum of informations > inorder to be able to diagnostic the problem: > > i use the linux mandriva 2006 > i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained > from the url: > http://www.bioperl.org/wiki/Release_1.5.2 > afetr that i made these commands which i found in the url > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph > INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ') > > >gunzip bioperl-1.5.2_102.tar.gz > >tar xvf bioperl-1.5.2_102.tar > >cd bioperl-1.5.2_102 > after that i made the command > >perl Build.PL > i obtained the text > this package requires Module::Build v0.2805 or greater to install > itself > install Module::Build now from CPAN?[y] > i pushed enter and i obtained many lines such as > System call"/usr/bin/wget -0-"ftp://.perl.org/pub/CPAN/modules/ > modlist.data.gz">home/fahmi/.cpan/sources/modules/03modlist.data > Not connected > cant access URL ftp://ftp.perl.org/CPAN/modules/modlist.data.gz > ... > i'm trying to install bioperl whithout having internet connection > beacause i don't know whay linux didn't detect my ethernet card. > please tell me what should i do. > tahnk you for your collaboration. From cjfields at uiuc.edu Mon Apr 2 14:10:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Apr 2007 13:10:30 -0500 Subject: [Bioperl-l] Fwd: BLAST beta, URLAPI, and BioPerl (RemoteBlast users) References: Message-ID: <002E7937-10DF-43CE-96F6-71DC743C1314@uiuc.edu> This may be of interest to anyone using RemoteBlast. For anyone who uses RemoteBlast, the new changes to NCBI's BLAST interface shouldn't affect anything (Scott tested it out). If there are any abnormalities with RemoteBlast queries over the next few weeks let us know. chris Begin forwarded message: > From: "Mcginnis, Scott \(NIH/NLM/NCBI\) [E]" > > Date: April 2, 2007 12:53:33 PM CDT > To: "Chris Fields" > Subject: RE: BLAST beta, URLAPI, and BioPerl > > Hi Chris: > > We are ready to make the new pages the defaults come April 16th. An > announcement is going out shortly. There are some very minor > changes to the URL API and I have listed them below. IT will be > part of the announcements. Please note we actually tested BioPerl > and it seems to me fine with the new pages. If you have a news on > your site or a mailing list you might want to pass this on. > > A Note About URLAPI > > The new BLAST pages support URLAPI, a protocol that scripts and > programs use to run BLAST searches and retrieve results over > HTTP. (For more on URLAPI, see > http://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html). The following > information only applies to you if you develop or are responsible > for software that uses URLAPI. > > The new pages have been tested and produce correct results with > the following URLAPI client programs: > > * the BioPERL RemoteBlast module > * the NCBI demo script http://ncbi.nlm.nih.gov/blast/docs/web_blast.pl > * various scripts used in-house at NCBI > > Users of URLAPI should be aware of the following minor > changes. In the new interface: > > 1. The Request ID (RID) format will be shorter. The new format > is 11 alphanumeric characters (e.g. RDEFEA5012) and will have no > internal structure. The previous RID format was 36 or more > characters long, including punctuation (e.g., > 1175172712-21345-42512597310.BLASTQ3). > > 2. BLAST reports will show masked regions as lower-case letters > by default (see > http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W6, > figure 2. The current default behavior is to show masked > regions as N's or X's. Users may recover the current behavior > by adding &MASK_CHAR=0 to the query string for a URLAPI > request. > > 3. BLAST reports will show alignments for 100 database sequences > by default. The current reports show only 50 alignments by > default. > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Mon 3/5/2007 11:50 AM > To: Mcginnis, Scott (NIH/NLM/NCBI) [E] > Subject: BLAST beta, URLAPI, and BioPerl > > The BioPerl project has several have several modules and parsers > which currently parse XML/text/tabular BLAST output, as well as a > module which is capable of posting BLAST queries via the URLAPI > interface. Will any of the BLAST changes affect these (particularly > URLAPI)? > > Thanks! > > chris > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From steletch at jouy.inra.fr Tue Apr 3 08:28:39 2007 From: steletch at jouy.inra.fr (=?ISO-8859-1?Q?St=E9phane_T=E9letch=E9a?=) Date: Tue, 03 Apr 2007 14:28:39 +0200 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: References: Message-ID: <46124877.4020605@jouy.inra.fr> Alex Lancaster a ?crit : > Hello bioperl, > > I'm new to the bioperl world, having just started a research position > in which I need to manage a large bioperl-based codebase. To this > end, I'm working on packaging bioperl as an official Fedora Package > (formerly "Fedora Extras") and I'm currently wading through and > packaging the long laundry list of Perl dependencies (then I'm going > to try and do the same for biopython). You can see my some of my > progress (including links to the reviews) here: > > http://fedoraproject.org/wiki/AlexLancaster > > Several issues have arisen during the packaging that I hope the > Nice, i was on my way to do it :-) I'm a Mandriva packager and have been kindly "spushed" for maintaining the bioperl package for Mandriva. You can have a look at the work already done by Mandriva at the addresses: http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl/current/ http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl-run/current/ (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-). Feel free to contact me if you need more input for dependencies, since they are quite a lot. Cheers, St?phane -- St?phane T?letch?a, PhD. http://www.steletch.org Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig INRA, Domaine de Vilvert T?l : (33) 134 652 891 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 From cjfields at uiuc.edu Tue Apr 3 10:58:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 3 Apr 2007 09:58:44 -0500 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <46124877.4020605@jouy.inra.fr> References: <46124877.4020605@jouy.inra.fr> Message-ID: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu> Once these are set up we should add a page to the bioperl wiki to describe them in more detail (along with Allen's Biopackages). chris On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote: > Alex Lancaster a ?crit : >> Hello bioperl, >> >> I'm new to the bioperl world, having just started a research position >> in which I need to manage a large bioperl-based codebase. To this >> end, I'm working on packaging bioperl as an official Fedora Package >> (formerly "Fedora Extras") and I'm currently wading through and >> packaging the long laundry list of Perl dependencies (then I'm going >> to try and do the same for biopython). You can see my some of my >> progress (including links to the reviews) here: >> >> http://fedoraproject.org/wiki/AlexLancaster >> >> Several issues have arisen during the packaging that I hope the >> > > Nice, i was on my way to do it :-) > I'm a Mandriva packager and have been kindly "spushed" for maintaining > the bioperl package for Mandriva. > > You can have a look at the work already done by Mandriva at the > addresses: > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- > bioperl/current/ > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- > bioperl-run/current/ > > (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-). > > Feel free to contact me if you need more input for dependencies, since > they are quite a lot. > > Cheers, > St?phane > > -- > St?phane T?letch?a, PhD. http://www.steletch.org > Unit? Math?matique Informatique et G?nome http:// > migale.jouy.inra.fr/mig > INRA, Domaine de Vilvert T?l : (33) 134 652 891 > 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From allenday at gmail.com Tue Apr 3 13:54:51 2007 From: allenday at gmail.com (Allen Day) Date: Tue, 3 Apr 2007 10:54:51 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu> References: <46124877.4020605@jouy.inra.fr> <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu> Message-ID: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com> You can link Biopackages now, it's been done for nearly 2 years. -Allen On 4/3/07, Chris Fields wrote: > Once these are set up we should add a page to the bioperl wiki to > describe them in more detail (along with Allen's Biopackages). > > chris > > On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote: > > > Alex Lancaster a ?crit : > >> Hello bioperl, > >> > >> I'm new to the bioperl world, having just started a research position > >> in which I need to manage a large bioperl-based codebase. To this > >> end, I'm working on packaging bioperl as an official Fedora Package > >> (formerly "Fedora Extras") and I'm currently wading through and > >> packaging the long laundry list of Perl dependencies (then I'm going > >> to try and do the same for biopython). You can see my some of my > >> progress (including links to the reviews) here: > >> > >> http://fedoraproject.org/wiki/AlexLancaster > >> > >> Several issues have arisen during the packaging that I hope the > >> > > > > Nice, i was on my way to do it :-) > > I'm a Mandriva packager and have been kindly "spushed" for maintaining > > the bioperl package for Mandriva. > > > > You can have a look at the work already done by Mandriva at the > > addresses: > > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- > > bioperl/current/ > > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- > > bioperl-run/current/ > > > > (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-). > > > > Feel free to contact me if you need more input for dependencies, since > > they are quite a lot. > > > > Cheers, > > St?phane > > > > -- > > St?phane T?letch?a, PhD. http://www.steletch.org > > Unit? Math?matique Informatique et G?nome http:// > > migale.jouy.inra.fr/mig > > INRA, Domaine de Vilvert T?l : (33) 134 652 891 > > 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Apr 3 14:11:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 3 Apr 2007 13:11:19 -0500 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com> References: <46124877.4020605@jouy.inra.fr> <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu> <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com> Message-ID: <0802E2EB-5E94-42D2-9CE1-B82DC103A5D1@uiuc.edu> I added a small piece on Biopackages to the wiki installation page: http://www.bioperl.org/wiki/Installing_BioPerl We can move links to RPM (or similar) installations to their own page or section in the INSTALL docs when we have time. chris On Apr 3, 2007, at 12:54 PM, Allen Day wrote: > You can link Biopackages now, it's been done for nearly 2 years. > > -Allen > > On 4/3/07, Chris Fields wrote: >> Once these are set up we should add a page to the bioperl wiki to >> describe them in more detail (along with Allen's Biopackages). >> >> chris >> >> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote: >> >>> Alex Lancaster a ?crit : >>>> Hello bioperl, >>>> >>>> I'm new to the bioperl world, having just started a research >>>> position >>>> in which I need to manage a large bioperl-based codebase. To this >>>> end, I'm working on packaging bioperl as an official Fedora Package >>>> (formerly "Fedora Extras") and I'm currently wading through and >>>> packaging the long laundry list of Perl dependencies (then I'm >>>> going >>>> to try and do the same for biopython). You can see my some of my >>>> progress (including links to the reviews) here: >>>> >>>> http://fedoraproject.org/wiki/AlexLancaster >>>> >>>> Several issues have arisen during the packaging that I hope the >>>> >>> >>> Nice, i was on my way to do it :-) >>> I'm a Mandriva packager and have been kindly "spushed" for >>> maintaining >>> the bioperl package for Mandriva. >>> >>> You can have a look at the work already done by Mandriva at the >>> addresses: >>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- >>> bioperl/current/ >>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- >>> bioperl-run/current/ >>> >>> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-). >>> >>> Feel free to contact me if you need more input for dependencies, >>> since >>> they are quite a lot. >>> >>> Cheers, >>> St?phane >>> >>> -- >>> St?phane T?letch?a, PhD. http://www.steletch.org >>> Unit? Math?matique Informatique et G?nome http:// >>> migale.jouy.inra.fr/mig >>> INRA, Domaine de Vilvert T?l : (33) 134 652 891 >>> 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Tue Apr 3 18:18:56 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 03 Apr 2007 23:18:56 +0100 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> <16153593-5B2A-43B4-9366-282C654E40E7@gmx.net> <5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com> Message-ID: <4612D2D0.7030202@sendu.me.uk> Chris Fields wrote: > On Mar 30, 2007, at 11:02 PM, Allen Day wrote: > >> The majority of the Bioperl classes are file parsers, or manipulate >> data that comes from the file parsers. Yes there are exceptions like >> the Eutils and Ensembl-intefacing classes, but they are the minority. >> The types of files that are worked with are generally either A) >> primary data sets such as genome data, or B) derivative data, such as >> sequence alignments that are derived from primary data using an >> algorithm. >> >> If we're in agreement that the primary data sets and >> libraries/applications for producing derivative data should not be >> present in Fedora Extras, then it follows that the Bioperl classes for >> manipulating these primary and derivative data should also not be >> present in Fedora Extras as they are of little use without data to >> manipulate. > > I respectfully disagree. Likewise, but in a slightly different way: for myself and surely many others the primary data used either isn't publicly released or isn't in some major database and therefore won't be in any kind of repository. That doesn't mean I wouldn't want the parser for my files to be somewhere convenient. From bix at sendu.me.uk Tue Apr 3 18:09:27 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 03 Apr 2007 23:09:27 +0100 Subject: [Bioperl-l] installation bioperl In-Reply-To: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu> References: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu> Message-ID: <4612D097.9060400@sendu.me.uk> > On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote: > >> thank you for answer. I will give you the maximum of informations >> inorder to be able to diagnostic the problem: >> >> i use the linux mandriva 2006 >> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained >> from the url: >> http://www.bioperl.org/wiki/Release_1.5.2 >> afetr that i made these commands which i found in the url >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph >> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ') [snip] >> i'm trying to install bioperl whithout having internet connection >> beacause i don't know whay linux didn't detect my ethernet card. >> please tell me what should i do. >> tahnk you for your collaboration. David's suggestion was a good one, but quite a lot (and possibly all you need) of BioPerl is usable just with the bioperl-1.5.2_102.tar.gz file you already have. Just follow the 'hard way' instructions: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_MODULES_THE_HARD_WAY Actually, its not that hard. Just extract the files from the .tat.gz and have your perl lib point at the resulting Bio directory. From t.r-a_ckright1 at tiscali.co.uk Wed Apr 4 08:00:12 2007 From: t.r-a_ckright1 at tiscali.co.uk (Michael Pain) Date: Wed, 4 Apr 2007 13:00:12 +0100 Subject: [Bioperl-l] Re: read it immediately Message-ID: <000501c776b0$cd5dd9b0$a7d42d54@122882420315> I have received three dics but i can not access the files as no ID or pasword was included in the package,I have paid for all this! Can you sort it out. Regards Michael Pain From thiago.venancio at gmail.com Wed Apr 4 14:14:04 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Wed, 4 Apr 2007 15:14:04 -0300 Subject: [Bioperl-l] read it immediately In-Reply-To: <000501c776b0$cd5dd9b0$a7d42d54@122882420315> References: <000501c776b0$cd5dd9b0$a7d42d54@122882420315> Message-ID: <44255ea80704041114pc284522tef2d3a3944763b90@mail.gmail.com> I think you emailed the wrong list... On 4/4/07, Michael Pain wrote: > > I have received three dics but i can not access the files as no ID or > pasword was included in the package,I have paid for all this! Can you sort > it out. > > Regards Michael Pain > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From gdorjee at hotmail.com Wed Apr 4 14:17:57 2007 From: gdorjee at hotmail.com (DeeGee) Date: Wed, 4 Apr 2007 11:17:57 -0700 (PDT) Subject: [Bioperl-l] blastall problem Message-ID: <9842643.post@talk.nabble.com> hi all, can anyone plz help me out with this problem that i've been dealing with for quite a while now. following is a part of my script that's not working for some reason. it is suppose to get the sequence from 'result/fasta.faa' and do the blast. ###my script ########### ...... my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => 'Fasta'); my $queryin = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => '/export/home/database/nr', _READMETHOD => 'Blast' ); $factory->outfile("result/out.blast"); my $blastreport = $factory->blastall($queryin); ..... when i paste the protein sequence into the textarea of my html page and save the same as 'result/fasta.faa', so that the above script would do the blast, i get the following error: Software error: ------------- EXCEPTION ------------- MSG: not Bio::Seq object or array of Bio::Seq objects or file name! STACK Bio::Tools::Run::StandAloneBlast::blastpgp /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611 STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 -------------------------------------- i would appreciate your help. i would also like to add that the 'result/fasta.faa' has the sequence saved in it. -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9842643 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gowthaman.ramasamy at sbri.org Wed Apr 4 14:57:09 2007 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Wed, 4 Apr 2007 11:57:09 -0700 Subject: [Bioperl-l] How to patch something in installed bioperl module Message-ID: Hi List, I am advised to patch (comment out some lines and add some) GFF.pm bioperl module. How do i go about it?. I have the latest Bioperl 1.5.2 version installed....via CPAN I find GFF.pm in the following location... /root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm Do i have to recompile it after editing........ I am completely clue less......I have not done this earlier..... Can any one help me to do this. Many thanks in advance........ gowthaman From dmessina at wustl.edu Wed Apr 4 15:42:43 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 4 Apr 2007 14:42:43 -0500 Subject: [Bioperl-l] blastall problem In-Reply-To: <9842643.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> Message-ID: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu> The code snippet worked fine for me. I believe the problem is that 'result/fasta.faa' is not getting passed to your code properly. You might try specifying a complete path to your input and output file -- relative paths, especially through a web app, can be tricky. > when i paste the protein sequence into the textarea of my html page > and save > the same as 'result/fasta.faa', so that the above script would do > the blast, I'm not sure from what you wrote -- did you try running your script on the command line (having created 'result/fasta.faa' manually first)? If that is working for you, then the problem is with getting the data from the webpage into the script, not with the blasting part. Dave This is what I did: % ls test.pl testp* test.pl testp.fa % formatdb -i testp.fa % ls test.pl testp* test.pl testp.fa testp.fa.phr testp.fa.pin testp.fa.psq % perl test.pl testp.fa % head -10 out.blast BLASTP 2.2.10 [Oct-19-2004] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens] (235 letters) Your code: I changed only the input filename and the input database name, and saved the script as test.pl ----------------------- #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' => 'Fasta'); my $queryin = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => 'testp.fa', _READMETHOD => 'Blast' ); $factory->outfile("out.blast"); my $blastreport = $factory->blastall($queryin); ------------------------------------------------------------------------ ----------- From gdorjee at hotmail.com Wed Apr 4 17:44:27 2007 From: gdorjee at hotmail.com (DeeGee) Date: Wed, 4 Apr 2007 14:44:27 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu> References: <9842643.post@talk.nabble.com> <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu> Message-ID: <9846257.post@talk.nabble.com> Thanks for your reply Dave. I don't think that there's anything wrong with the open(OUTPUT,">result/fasta.faa"); line as I could get the 'fasta.faa' file with the sequence in it. I see it. It looks like the blast is not being able to read from the result/fasta.faa. ^ ^* Dave Messina-2 wrote: > > The code snippet worked fine for me. I believe the problem is that > 'result/fasta.faa' is not getting passed to your code properly. You > might try specifying a complete path to your input and output file -- > relative paths, especially through a web app, can be tricky. > >> when i paste the protein sequence into the textarea of my html page >> and save >> the same as 'result/fasta.faa', so that the above script would do >> the blast, > > I'm not sure from what you wrote -- did you try running your script > on the command line (having created 'result/fasta.faa' manually > first)? If that is working for you, then the problem is with getting > the data from the webpage into the script, not with the blasting part. > > Dave > > This is what I did: > > % ls test.pl testp* > test.pl testp.fa > > % formatdb -i testp.fa > > % ls test.pl testp* > test.pl testp.fa testp.fa.phr testp.fa.pin testp.fa.psq > > % perl test.pl testp.fa > % head -10 out.blast > BLASTP 2.2.10 [Oct-19-2004] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens] > (235 letters) > > > Your code: I changed only the input filename and the input database > name, and saved the script as test.pl > ----------------------- > #!/usr/bin/perl > > use strict; > use warnings; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' => > 'Fasta'); > my $queryin = $Seq_in->next_seq(); > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > 'database' => > 'testp.fa', > _READMETHOD => 'Blast' > ); > $factory->outfile("out.blast"); > my $blastreport = $factory->blastall($queryin); > ------------------------------------------------------------------------ > ----------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9846257 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From torsten.seemann at infotech.monash.edu.au Wed Apr 4 20:17:10 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 5 Apr 2007 10:17:10 +1000 Subject: [Bioperl-l] How to patch something in installed bioperl module In-Reply-To: References: Message-ID: > I am advised to patch (comment out some lines and add some) GFF.pm bioperl module. > How do i go about it?. First, make a backup of the original file. Then just edit the original (add/remove lines). > I have the latest Bioperl 1.5.2 version installed....via CPAN > I find GFF.pm in the following location... > /root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm This is not where it is installed. That is where the CPAN program uncompressed it to before installing. It is more likely in a directory like this: /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/GFF.pm But it depends on how your Perl setup arranges things! > Do i have to recompile it after editing........ No. --Torsten From torsten.seemann at infotech.monash.edu.au Wed Apr 4 20:22:37 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 5 Apr 2007 10:22:37 +1000 Subject: [Bioperl-l] blastall problem In-Reply-To: <9842643.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> Message-ID: > Software error: > ------------- EXCEPTION ------------- > MSG: not Bio::Seq object or array of Bio::Seq objects or file name! > STACK Bio::Tools::Run::StandAloneBlast::blastpgp > /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611 > STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 > my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => 'Fasta'); Does this still happen if you give the full path to the FASTA file? eg. -file => /usr/local/apache2/htdocs/result/fasta.faa (I'm guessing what the full path is here) --Torsten From gilbertd at cricket.bio.indiana.edu Wed Apr 4 20:59:23 2007 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Wed, 4 Apr 2007 19:59:23 -0500 (EST) Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output Message-ID: <200704050059.l350xNF07452@cricket.bio.indiana.edu> Dear Bioperl list, There is a small bug in what I think is the current Bio::Tools::GFF.pm, that blocks output of Target attributes (in gff3 at least). See a patch here http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF -- Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ From torsten.seemann at infotech.monash.edu.au Wed Apr 4 21:34:17 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 5 Apr 2007 11:34:17 +1000 Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports Message-ID: Dear all, I have been migrating all our BLAST infrastructure to use the XML output mode, the "blastpgp -m 7" option, referred to 'blastxml' format in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report before, and encountered some issues I hope you can help me with: 1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a Bio::Search::Result::GenericResult object. This means I can not use the PSI-BLAST functions like iterations() and psiblast() provided by Bio::Search::Result::BlastResult. I'm guessing this is because the the XML output reports itself as a plain BLASTP output: blastp How do I determine if it is a PSI-BLAST report? 2. Usually a PSI-BLAST report has multiple Iterations. The XML output has tags but it took me a while to figure out that these get mapped to Bio::SearchIO::Result objects accessible via Bio::SearchIO->next_result(). Is this the proper way to process the iterations? 3. I also notice that only the first result (iteration) has the query_name set. Subsequent ones are empty: RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=MyProtein , db=uniprot_sprot RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query= , db=uniprot_sprot Is this a bug or expected? I'm guessing a lot of these problems are simply due to limitations of the NCBI BLAST XML DTD? --Torsten From gdorjee at hotmail.com Wed Apr 4 20:59:08 2007 From: gdorjee at hotmail.com (DeeGee) Date: Wed, 4 Apr 2007 17:59:08 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: References: <9842643.post@talk.nabble.com> Message-ID: <9848412.post@talk.nabble.com> hi Torsten, Yes, it still gives me the same error even if I give the full path to the fasta file. Following is how I did: ####### part of my script ####### my $Seq_in = Bio::SeqIO->new (-file => '/export/home/local/apache2/htdocs/result/fasta.faa', -format => 'Fasta'); my $queryin = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => '/export/home/dorjee/database/nrpart', _READMETHOD => 'Blast' ); $factory->outfile("/export/home/local/apache2/htdocs/result/out.blast"); my $blastreport = $factory->blastall($queryin); .... thanks man. Torsten Seemann wrote: > >> Software error: >> ------------- EXCEPTION ------------- >> MSG: not Bio::Seq object or array of Bio::Seq objects or file name! >> STACK Bio::Tools::Run::StandAloneBlast::blastpgp >> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611 >> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 > >> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => >> 'Fasta'); > > Does this still happen if you give the full path to the FASTA file? > eg. -file => /usr/local/apache2/htdocs/result/fasta.faa > (I'm guessing what the full path is here) > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9848412 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From torsten.seemann at infotech.monash.edu.au Wed Apr 4 22:57:09 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 5 Apr 2007 12:57:09 +1000 Subject: [Bioperl-l] blastall problem In-Reply-To: <9842643.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> Message-ID: DeeGee, Please add the following lines to help deduce the problem: > my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => > 'Fasta'); die "could not open fasta" if not defined $Seq_in; > my $queryin = $Seq_in->next_seq(); die "could not get seq" if not defined $queryin; Does anything happen now? ... Some other comments: > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', > STACK Bio::Tools::Run::StandAloneBlast::blastpgp I'm not sure why it is in the blastpgp() method when you chose $factory->blastall() ? > _READMETHOD => 'Blast' I don't think this is required anymore in modern Bioperl. Are you using 1.5.x or bioperl-live ? > when i paste the protein sequence into the textarea of my html page and > STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 So this is a CGI script? Does the script run as user 'apache' or 'httpd', or as yourself via SuEXEC? Does 'apache' have permissions to READ/WRITE the result/ directory? --Torsten From cjfields at uiuc.edu Thu Apr 5 00:14:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Apr 2007 23:14:46 -0500 Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports In-Reply-To: References: Message-ID: <8EA4D933-9B99-485E-9CEA-AB39297F90B4@uiuc.edu> On Apr 4, 2007, at 8:34 PM, Torsten Seemann wrote: > Dear all, > > I have been migrating all our BLAST infrastructure to use the XML > output mode, the "blastpgp -m 7" option, referred to 'blastxml' format > in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report > before, and encountered some issues I hope you can help me with: > > 1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a > Bio::Search::Result::GenericResult object. This means I can not use > the PSI-BLAST functions like iterations() and psiblast() provided by > Bio::Search::Result::BlastResult. I'm guessing this is because the the > XML output reports itself as a plain BLASTP output: > blastp > > How do I determine if it is a PSI-BLAST report? I don't know if you can very easily, though I haven't tried myself. If I remember correctly there wasn't a substantial difference in the XML output between regular BLAST XML and PSI-BLAST XML. We could add a parameter to the parser to treat the report as PSI-BLAST. > 2. Usually a PSI-BLAST report has multiple Iterations. The XML output > has tags but it took me a while to figure out that these > get mapped to Bio::SearchIO::Result objects accessible via > Bio::SearchIO->next_result(). > > Is this the proper way to process the iterations? The problem is in the way that NCBI now outputs multiple-query BLAST XML reports, which apparently changed sometime in the last year w/o notice. This was also a problem with other Bio* parsers (I remember seeing something about it on the BioPython list). Previously multiquery BLAST requests were output like single XML reports concatenated together, each with their own XML declaration, etc. Now they are treated like iterations (query 1 = iteration 1, query 2 = iteration 2, etc) all in one long BLAST report. There's an example of one in the SearchIO tests which I added to CVS in Jan-Feb, post-1.5.2. The current parser handles both old and new cases. The current behavior of the parser is to parse everything up front, building up the ResultI's and then returning them one-by-one upon next_result(), which is horrible on memory if you have tons of XML to wade through. I will probably change that to carve the data up into report-sized chunks of XML and parse them piecemeal, but I haven't had time to work on it yet. > 3. I also notice that only the first result (iteration) has the > query_name set. Subsequent ones are empty: > RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP, > query=MyProtein , db=uniprot_sprot > RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query= > , db=uniprot_sprot > > Is this a bug or expected? If you are using 1.5.2 then there is a bug related to that which was fixed in CVS a few months back (related to the multiquery issue above). If it isn't let me know. > I'm guessing a lot of these problems are simply due to limitations of > the NCBI BLAST XML DTD? > > --Torsten To tell the truth I'm not sure. One would think they could add some designation to the report for PSI-BLAST! chris From cjfields at uiuc.edu Thu Apr 5 13:40:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Apr 2007 12:40:41 -0500 Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq (Bio::Seq::Meta::Array) Message-ID: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> Roy Chaudhuri has raised an interesting question in a bug report filed regarding 'bless'-ing objects into another (similar) class. The bug report on this is here: http://bugzilla.open-bio.org/show_bug.cgi?id=2262 The following code (from the bug report) illustrates the problem. Note some of this is taken from the Bio::Seq::Meta::Array POD, though the example sequence object is a LocatableSeq (PrimarySeqI) and not a SeqI: use Bio::SeqIO; use Bio::Seq::Meta::Array; # $seq isa Bio::SeqI my $seq=Bio::SeqIO->new(-fh=>\*ARGV, -format=>'genbank')->next_seq; # $seq is still a Bio::SeqI bless $seq, 'Bio::Seq::Meta::Array'; Bio::SeqIO->new(-format=>'genbank')->write_seq($seq); This produces sequence output missing sequence data, a definition, and other odds and ends. $seq is first a Bio::Seq::RichSeq and is blessed into a Bio::Seq::Meta::Array; both times $seq remains Bio::SeqI. However, Bio::Seq::Meta::Array has an odd inheritance tree which also makes it a Bio::PrimarySeqI and a Bio::Seq::MetaI (ick): use base qw(Bio::LocatableSeq Bio::Seq Bio::Seq::MetaI); Bio::LocatableSeq has a seq() method inherited from Bio::PrimarySeq, for instance, so using $seq->seq() invokes Bio::PrimarySeq::seq() instead of Bio::Seq::seq(). No problem in most cases as long as PrimarySeqI is blessed into another PrimarySeqI, but if one blesses a Bio::SeqI into a Bio::Seq::Meta::Array (as in the example) then PrimarySeq::seq() expects a raw sequence and gets none (since the data is stored internally as a PrimarySeq in a different location) and no sequence is output. This happens similarly for other stored object data. I'm not sure why Bio::Seq::Meta::Array is set up this way. Do we want to support using 'bless $obj, Class' with Bio::SeqI/PrimarySeqI, or should Bio::Seq::Meta::Array be changed so that it follows one interface or the other? chris From hlapp at gmx.net Thu Apr 5 14:27:39 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 5 Apr 2007 14:27:39 -0400 Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq (Bio::Seq::Meta::Array) In-Reply-To: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> Message-ID: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote: > Do we want to support using 'bless $obj, Class' This smacks of over-clever programming and looks like a sure way to obfuscate what you're doing. I'm not sure why we need to support this construct. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Apr 5 14:44:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Apr 2007 13:44:38 -0500 Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq (Bio::Seq::Meta::Array) In-Reply-To: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net> References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net> Message-ID: I tend to agree on that front as it seems too prone to subtle issues with inheritance (as the bug demonstrates). Related to that, do we want to have Bio::Seq::Meta::Array implement either PrimarySeqI or SeqI? Having it implement both is definitely not working as expected. chris On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote: > > On Apr 5, 2007, at 1:40 PM, Chris Fields wrote: > >> Do we want to support using 'bless $obj, Class' > > This smacks of over-clever programming and looks like a sure way to > obfuscate what you're doing. I'm not sure why we need to support > this construct. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mkiwala at watson.wustl.edu Thu Apr 5 15:11:22 2007 From: mkiwala at watson.wustl.edu (Michael Kiwala) Date: Thu, 05 Apr 2007 14:11:22 -0500 Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq (Bio::Seq::Meta::Array) In-Reply-To: References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net> Message-ID: <461549DA.90709@watson.wustl.edu> My vote is for SeqI. I was using the SeqWithQuality class and more recently switched over to Bio::Seq::Quality as we are upgrading from 1.4 to 1.5.2. The sequences I'm working with are destined for GenBank and have features and quality values. I've written a module (that I call GenBank::Tbl2Asn) that accepts a Bio::Seq::Quality with features and runs tbl2asn on it to produce a file that we send to GenBank. I don't know of any other class that suites my needs better than Bio::Seq::Quality inheriting from Bio::SeqI. Chris Fields wrote: > I tend to agree on that front as it seems too prone to subtle issues > with inheritance (as the bug demonstrates). > > Related to that, do we want to have Bio::Seq::Meta::Array implement > either PrimarySeqI or SeqI? Having it implement both is definitely > not working as expected. > > chris > > On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote: > > >> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote: >> >> >>> Do we want to support using 'bless $obj, Class' >>> >> This smacks of over-clever programming and looks like a sure way to >> obfuscate what you're doing. I'm not sure why we need to support >> this construct. >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From gdorjee at hotmail.com Thu Apr 5 17:09:14 2007 From: gdorjee at hotmail.com (DeeGee) Date: Thu, 5 Apr 2007 14:09:14 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: References: <9842643.post@talk.nabble.com> Message-ID: <9864004.post@talk.nabble.com> Thanks again, Torsten. I tried (die "could not get seq" if not defined $queryin;) as you suggested, and now I get the following error message: Software error: could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50. Does this mean that next_seq() method in 'my $queryin = $Seq_in->next_seq();' has some problem? How can I fix it? I would appreciate your help. Cheers! Torsten Seemann wrote: > > DeeGee, > > Please add the following lines to help deduce the problem: > >> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => >> 'Fasta'); > > die "could not open fasta" if not defined $Seq_in; > >> my $queryin = $Seq_in->next_seq(); > > die "could not get seq" if not defined $queryin; > > Does anything happen now? > > ... > > Some other comments: > >> my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >> 'blastp', >> STACK Bio::Tools::Run::StandAloneBlast::blastpgp > > I'm not sure why it is in the blastpgp() method when you chose > $factory->blastall() ? > >> _READMETHOD => 'Blast' > > I don't think this is required anymore in modern Bioperl. Are you > using 1.5.x or bioperl-live ? > >> when i paste the protein sequence into the textarea of my html page and >> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 > > So this is a CGI script? > Does the script run as user 'apache' or 'httpd', or as yourself via > SuEXEC? > Does 'apache' have permissions to READ/WRITE the result/ directory? > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9864004 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Thu Apr 5 19:32:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Apr 2007 18:32:55 -0500 Subject: [Bioperl-l] blastall problem In-Reply-To: <9864004.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> Message-ID: <3ED7F1E9-FE21-4796-99AC-0CD0EA418563@uiuc.edu> On Apr 5, 2007, at 4:09 PM, DeeGee wrote: > > Thanks again, Torsten. I tried (die "could not get seq" if not defined > $queryin;) as you suggested, and now I get the following error > message: > > Software error: > could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50. > > Does this mean that next_seq() method in 'my $queryin = > $Seq_in->next_seq();' has some problem? How can I fix it? I would > appreciate > your help. > Cheers! This indicates there is likely some problem with your sequence file (either it isn't fasta or something else is wrong), but w/o actually seeing it we can't be sure. I can't be sure but I don't think it is a next_seq() issue. Also, if there are problems accessing the file the stream object should throw an error so I don't think it is that either... chris > > Torsten Seemann wrote: >> >> DeeGee, >> >> Please add the following lines to help deduce the problem: >> >>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '- >>> format' => >>> 'Fasta'); >> >> die "could not open fasta" if not defined $Seq_in; >> >>> my $queryin = $Seq_in->next_seq(); >> >> die "could not get seq" if not defined $queryin; >> >> Does anything happen now? >> >> ... >> >> Some other comments: >> >>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >>> 'blastp', >>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp >> >> I'm not sure why it is in the blastpgp() method when you chose >> $factory->blastall() ? >> >>> _READMETHOD => >>> 'Blast' >> >> I don't think this is required anymore in modern Bioperl. Are you >> using 1.5.x or bioperl-live ? >> >>> when i paste the protein sequence into the textarea of my html >>> page and >>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 >> >> So this is a CGI script? >> Does the script run as user 'apache' or 'httpd', or as yourself via >> SuEXEC? >> Does 'apache' have permissions to READ/WRITE the result/ directory? >> >> --Torsten >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/blastall- > problem-tf3527412.html#a9864004 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Thu Apr 5 20:40:32 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 6 Apr 2007 10:40:32 +1000 Subject: [Bioperl-l] blastall problem In-Reply-To: <9864004.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> Message-ID: Dorjee, > thanks alot for your reply again. as per your suggestion (using 'die "could > not get seq" if not defined $queryin;'), i now get the following error > message: > Software error: > could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50. > i've attached the script. could you plz have a look at it and see where am i > going wrong. > cheers mate! This strongly suggests that your FASTA file is not actually in FASTA format. http://en.wikipedia.org/wiki/Fasta_format Does it work if you pass it to blastall on the command line? eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr > Saier Lab. > 858-534-2457 Are you working at UCSD? --Torsten From gdorjee at hotmail.com Thu Apr 5 23:26:16 2007 From: gdorjee at hotmail.com (DeeGee) Date: Thu, 5 Apr 2007 20:26:16 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> Message-ID: <9867402.post@talk.nabble.com> hi Torsten, blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works perfectly fine on the command line, and the 'fasta.faa' is in fasta format: >gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS AQGAVAPGPDGGGPFPPWPLG it seems like i'm just one bloody step away from success. ^ ^* can't figure out the prob. thanks for your help. Torsten Seemann wrote: > > Dorjee, > >> thanks alot for your reply again. as per your suggestion (using 'die >> "could >> not get seq" if not defined $queryin;'), i now get the following error >> message: >> Software error: >> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50. >> i've attached the script. could you plz have a look at it and see where >> am i >> going wrong. >> cheers mate! > > This strongly suggests that your FASTA file is not actually in FASTA > format. > http://en.wikipedia.org/wiki/Fasta_format > > Does it work if you pass it to blastall on the command line? > eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr > >> Saier Lab. >> 858-534-2457 > > Are you working at UCSD? > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9867402 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From tuco at pasteur.fr Fri Apr 6 09:33:08 2007 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Fri, 06 Apr 2007 15:33:08 +0200 Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior Message-ID: <46164C14.8040701@pasteur.fr> Hi folks, I have a strange behavior from Bio::SeqIO::embl. When I read an EMBL file as an input and write to another one, the tags in the output file (EMBL format) are not in the same order as the original file. Is it a normal and expecting result ? I anyone want to test it as a perl on line here is the code : perl -MBio::SeqIO -e '$i = Bio::SeqIO->new(-file => "file.embl", -format => "EMBL"); $o = Bio::SeqIO->new(-file => ">new.embl", -format => "EMBL"); while($e = $i->next_seq()){ $o->write_seq($e); }' I checked in the embl.pm code but was enable to find where this behavior came from. If someone has the solution or any clue. Thanks Regards Emmanuel -- ------------------------- Emmanuel Quevillon Softwares and data banks Pasteur Insititue tuco at_ pasteur dot fr ------------------------- From dmessina at wustl.edu Fri Apr 6 11:09:51 2007 From: dmessina at wustl.edu (David Messina) Date: Fri, 6 Apr 2007 10:09:51 -0500 Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior In-Reply-To: <46164C14.8040701@pasteur.fr> References: <46164C14.8040701@pasteur.fr> Message-ID: <7C67D287-DE2A-488A-8636-01EFF468368D@wustl.edu> > Is it a normal and expecting result ? Yes, unfortunately. Due to the complexity of the parsing, it is surprisingly difficult to "round-trip" some sequence file formats. http://www.bioperl.org/wiki/HOWTO:SeqIO#Caveats Dave From jason at bioperl.org Fri Apr 6 11:42:41 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 6 Apr 2007 08:42:41 -0700 Subject: [Bioperl-l] blastall problem In-Reply-To: <9867402.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> Message-ID: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> When/How are are you writing your sequences to this file result.faa? are you using seqIO or bioperl to write the sequence to a file? I'm wondering if this is I/O buffering problem. On Apr 5, 2007, at 8:26 PM, DeeGee wrote: > > hi Torsten, > blastall -p blastp -i result/fasta.faa -d /export/home/database/nr > works > perfectly fine on the command line, and the 'fasta.faa' is in fasta > format: > >> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] > HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV > SPSMTVASSQ > QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA > GTAPGAEGPA > PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF > RRKEHLRRHR > DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH > QRIHGRAAAS > AQGAVAPGPDGGGPFPPWPLG > > it seems like i'm just one bloody step away from success. ^ ^* > can't figure > out the prob. > thanks for your help. > > > Torsten Seemann wrote: >> >> Dorjee, >> >>> thanks alot for your reply again. as per your suggestion (using 'die >>> "could >>> not get seq" if not defined $queryin;'), i now get the following >>> error >>> message: >>> Software error: >>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl >>> line 50. >>> i've attached the script. could you plz have a look at it and see >>> where >>> am i >>> going wrong. >>> cheers mate! >> >> This strongly suggests that your FASTA file is not actually in FASTA >> format. >> http://en.wikipedia.org/wiki/Fasta_format >> >> Does it work if you pass it to blastall on the command line? >> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ >> database/nr >> >>> Saier Lab. >>> 858-534-2457 >> >> Are you working at UCSD? >> >> --Torsten >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/blastall- > problem-tf3527412.html#a9867402 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0001.bin From bernd.web at gmail.com Fri Apr 6 14:00:18 2007 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 6 Apr 2007 20:00:18 +0200 Subject: [Bioperl-l] blastall problem In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> Message-ID: <716af09c0704061100n1555915bw18050639d25cbf89@mail.gmail.com> Hi Dorjee, Do you now use complete file paths everywhere (instead of some relative paths that were in your script). Did you check all read and execute permission (turn r, x on for group and others)? And regarding the fasta file I'd suggest closing the filehandle after you printed the fasta sequence to the file. open(OUTPUT,">result/fasta.faa"); #don't use this relative path and use the "die" as was suggested earlier. .... your other code lines print OUTPUT "$desc\n$seqo\n"; close(OUTPUT); #close the file. Also check if your complete script runs from the command-line as to be sure your problems are not related to the webserver enviroment. BTW I do think you do not want to parse your fasta file like you do: if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;} $fasta_file=~s/[\n\r]//g; if ($fasta_file =~ /([A-Z]{10}.+)/){$seqo=$1;} $seqo will contain the description as well, so your sequence starts with the description. BioPerl provides code for fasta file parsing too ;-) If you really want to stick to your code you can catch the $desc and $seqo in one RegExp, or replace this line: if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;} with if ($fasta_file =~ s/^(\>.+)\s+//){$desc=$1;} I hope you will get your script working now. Regards, Bernd On 4/6/07, Jason Stajich wrote: > When/How are are you writing your sequences to this file result.faa? are > you using seqIO or bioperl to write the sequence to a file? > I'm wondering if this is I/O buffering problem. > > > > On Apr 5, 2007, at 8:26 PM, DeeGee wrote: > > > hi Torsten, > blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works > perfectly fine on the command line, and the 'fasta.faa' is in fasta format: > > > gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] > HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ > QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA > PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR > DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS > AQGAVAPGPDGGGPFPPWPLG > > it seems like i'm just one bloody step away from success. ^ ^* can't figure > out the prob. > thanks for your help. > > > Torsten Seemann wrote: > > Dorjee, > > > thanks alot for your reply again. as per your suggestion (using 'die > "could > not get seq" if not defined $queryin;'), i now get the following error > message: > Software error: > could not get seq at > /usr/local/apache2/htdocs/remote_ncbi.pl line 50. > i've attached the script. could you plz have a look at it and see where > am i > going wrong. > cheers mate! > > This strongly suggests that your FASTA file is not actually in FASTA > format. > http://en.wikipedia.org/wiki/Fasta_format > > Does it work if you pass it to blastall on the command line? > eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr > > > Saier Lab. > 858-534-2457 > > Are you working at UCSD? > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > View this message in context: > http://www.nabble.com/blastall-problem-tf3527412.html#a9867402 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.htmlhttp://fungalgenomes.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From gdorjee at hotmail.com Fri Apr 6 13:39:38 2007 From: gdorjee at hotmail.com (DeeGee) Date: Fri, 6 Apr 2007 10:39:38 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> Message-ID: <9875685.post@talk.nabble.com> Following is the part of my script, which is in the 'htdocs' directory: ####### part of my script ############# #generate a new CGI object from the input to the CGI script my $query=new CGI; open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa"); print STDOUT $query->header(); print STDOUT $query->start_html(-title=>"Response from blast", -BGCOLOR=>"#FFFFFF"); print STDOUT "\n

Results from the BLAST

\n"; #gets the sequence from the html textarea with ?post? method my $fasta_file=$query->param('sequence'); print OUTPUT $fasta_file; #Local blast of the input sequence against nr database my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format => 'Fasta'); die "could not open fasta" if not defined $Seq_in; my $queryin = $Seq_in->next_seq(); die "could not get seq" if not defined $queryin; my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => '/export/home/dorjee/database/nr', _READMETHOD => 'Blast' ); $factory->outfile("result/out.blast"); my $blastreport = $factory->blastall($queryin); ..... Thank you. Jason Stajich-3 wrote: > > When/How are are you writing your sequences to this file result.faa? > are you using seqIO or bioperl to write the sequence to a file? > I'm wondering if this is I/O buffering problem. > > On Apr 5, 2007, at 8:26 PM, DeeGee wrote: > >> >> hi Torsten, >> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr >> works >> perfectly fine on the command line, and the 'fasta.faa' is in fasta >> format: >> >>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] >> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV >> SPSMTVASSQ >> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA >> GTAPGAEGPA >> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF >> RRKEHLRRHR >> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH >> QRIHGRAAAS >> AQGAVAPGPDGGGPFPPWPLG >> >> it seems like i'm just one bloody step away from success. ^ ^* >> can't figure >> out the prob. >> thanks for your help. >> >> >> Torsten Seemann wrote: >>> >>> Dorjee, >>> >>>> thanks alot for your reply again. as per your suggestion (using 'die >>>> "could >>>> not get seq" if not defined $queryin;'), i now get the following >>>> error >>>> message: >>>> Software error: >>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl >>>> line 50. >>>> i've attached the script. could you plz have a look at it and see >>>> where >>>> am i >>>> going wrong. >>>> cheers mate! >>> >>> This strongly suggests that your FASTA file is not actually in FASTA >>> format. >>> http://en.wikipedia.org/wiki/Fasta_format >>> >>> Does it work if you pass it to blastall on the command line? >>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ >>> database/nr >>> >>>> Saier Lab. >>>> 858-534-2457 >>> >>> Are you working at UCSD? >>> >>> --Torsten >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> View this message in context: http://www.nabble.com/blastall- >> problem-tf3527412.html#a9867402 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9875685 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Fri Apr 6 14:40:42 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 6 Apr 2007 11:40:42 -0700 Subject: [Bioperl-l] blastall problem In-Reply-To: <9875685.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> <9875685.post@talk.nabble.com> Message-ID: Looks like you need to deal with buffering: http://perl.plover.com/FAQs/Buffering.html So you need to add this: close(OUTPUT); Alternatively you can build a sequence object and pass that in to the BLAST factory, then you don't have to mess around with creating temporary files or run into this sort of problem. -jason On Apr 6, 2007, at 10:39 AM, DeeGee wrote: > > Following is the part of my script, which is in the 'htdocs' > directory: > > ####### part of my script ############# > #generate a new CGI object from the input to the CGI script > my $query=new CGI; > > open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa"); > > print STDOUT $query->header(); > print STDOUT $query->start_html(-title=>"Response from blast", > -BGCOLOR=>"#FFFFFF"); > print STDOUT "\n

Results from the BLAST

\n"; > > #gets the sequence from the html textarea with ?post? method > my $fasta_file=$query->param('sequence'); > print OUTPUT $fasta_file; > close(OUTPUT); > #Local blast of the input sequence against nr database > my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format => > 'Fasta'); > die "could not open fasta" if not defined $Seq_in; > my $queryin = $Seq_in->next_seq(); > die "could not get seq" if not defined $queryin; > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > 'database' => > '/export/home/dorjee/database/nr', > _READMETHOD => > 'Blast' > ); > $factory->outfile("result/out.blast"); > my $blastreport = $factory->blastall($queryin); > ..... > > Thank you. > > > > Jason Stajich-3 wrote: >> >> When/How are are you writing your sequences to this file result.faa? >> are you using seqIO or bioperl to write the sequence to a file? >> I'm wondering if this is I/O buffering problem. >> >> On Apr 5, 2007, at 8:26 PM, DeeGee wrote: >> >>> >>> hi Torsten, >>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr >>> works >>> perfectly fine on the command line, and the 'fasta.faa' is in fasta >>> format: >>> >>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] >>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA >>> SV >>> SPSMTVASSQ >>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP >>> LA >>> GTAPGAEGPA >>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK >>> AF >>> RRKEHLRRHR >>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL >>> RH >>> QRIHGRAAAS >>> AQGAVAPGPDGGGPFPPWPLG >>> >>> it seems like i'm just one bloody step away from success. ^ ^* >>> can't figure >>> out the prob. >>> thanks for your help. >>> >>> >>> Torsten Seemann wrote: >>>> >>>> Dorjee, >>>> >>>>> thanks alot for your reply again. as per your suggestion (using >>>>> 'die >>>>> "could >>>>> not get seq" if not defined $queryin;'), i now get the following >>>>> error >>>>> message: >>>>> Software error: >>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl >>>>> line 50. >>>>> i've attached the script. could you plz have a look at it and see >>>>> where >>>>> am i >>>>> going wrong. >>>>> cheers mate! >>>> >>>> This strongly suggests that your FASTA file is not actually in >>>> FASTA >>>> format. >>>> http://en.wikipedia.org/wiki/Fasta_format >>>> >>>> Does it work if you pass it to blastall on the command line? >>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ >>>> database/nr >>>> >>>>> Saier Lab. >>>>> 858-534-2457 >>>> >>>> Are you working at UCSD? >>>> >>>> --Torsten >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> -- >>> View this message in context: http://www.nabble.com/blastall- >>> problem-tf3527412.html#a9867402 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Miller Research Fellow >> University of California, Berkeley >> lab: 510.642.8441 >> http://pmb.berkeley.edu/~taylor/people/js.html >> http://fungalgenomes.org/ >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > View this message in context: http://www.nabble.com/blastall- > problem-tf3527412.html#a9875685 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment.bin From MEC at stowers-institute.org Fri Apr 6 16:34:37 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 6 Apr 2007 15:34:37 -0500 Subject: [Bioperl-l] Bio/DB/SeqFeature/Store/DBI/mysql.pm patched Message-ID: Lincoln, I just commited a patch to Bio/DB/SeqFeature/Store/DBI/mysql.pm which avoids potential problem which, unless fixed, can generates warnings that look like this: prepare_cached(SELECT f.id,f.object FROM feature as f, typelist AS tl WHERE ( tl.id=f.typeid AND (tl.tag LIKE ?) ) ) statement handle DBI::st=HASH(0x16f61c0) still Active at /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 1427 DBD::mysql::st fetchrow_array failed: fetch() without execute() at /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 1416. ... as well as other downstream abberent program behaviour. I encounterd what the DBI manpage suggests might happen: "The results will certainly not be what you expect" This can happen, for example, when you open an iterator using Bio::DB::SeqFeature::Store->get_seq_stream, and then while iterating, perform other queries against the store. My understanding of the DBI doc is that this should only occur if the 2nd iterator is for the same sql statement identically parameterized as the 1st, but I have not proven beyond a doubt that this is what Bio::DB::SeqFeature::Store is doing the way I am using it. Nonetheless, the patch fixes my pipeline. Cheers, Malcolm From gdorjee at hotmail.com Fri Apr 6 18:27:54 2007 From: gdorjee at hotmail.com (DeeGee) Date: Fri, 6 Apr 2007 15:27:54 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> <9875685.post@talk.nabble.com> Message-ID: <9879110.post@talk.nabble.com> I added the line: close(OUTPUT); and now following error comes up, where 'out.blast' is supposed to be the blast result file, but it not being created. Software error: ------------- EXCEPTION ------------- MSG: Could not open /export/home/dorjee/result/out.blast: No such file or directory STACK Bio::Root::IO::_initialize_io /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:273 STACK Bio::Root::IO::new /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:213 STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:135 STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:167 STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:53 -------------------------------------- Jason Stajich-3 wrote: > > Looks like you need to deal with buffering: > > http://perl.plover.com/FAQs/Buffering.html > > So you need to add this: > close(OUTPUT); > > Alternatively you can build a sequence object and pass that in to the > BLAST factory, then you don't have to mess around with creating > temporary files or run into this sort of problem. > > -jason > On Apr 6, 2007, at 10:39 AM, DeeGee wrote: > >> >> Following is the part of my script, which is in the 'htdocs' >> directory: >> >> ####### part of my script ############# >> #generate a new CGI object from the input to the CGI script >> my $query=new CGI; >> >> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa"); >> >> print STDOUT $query->header(); >> print STDOUT $query->start_html(-title=>"Response from blast", >> -BGCOLOR=>"#FFFFFF"); >> print STDOUT "\n

Results from the BLAST

\n"; >> >> #gets the sequence from the html textarea with ?post? method >> my $fasta_file=$query->param('sequence'); >> print OUTPUT $fasta_file; >> > close(OUTPUT); >> #Local blast of the input sequence against nr database >> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format => >> 'Fasta'); >> die "could not open fasta" if not defined $Seq_in; >> my $queryin = $Seq_in->next_seq(); >> die "could not get seq" if not defined $queryin; >> my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >> 'blastp', >> 'database' => >> '/export/home/dorjee/database/nr', >> _READMETHOD => >> 'Blast' >> ); >> $factory->outfile("result/out.blast"); >> my $blastreport = $factory->blastall($queryin); >> ..... >> >> Thank you. >> >> >> >> Jason Stajich-3 wrote: >>> >>> When/How are are you writing your sequences to this file result.faa? >>> are you using seqIO or bioperl to write the sequence to a file? >>> I'm wondering if this is I/O buffering problem. >>> >>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote: >>> >>>> >>>> hi Torsten, >>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr >>>> works >>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta >>>> format: >>>> >>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] >>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA >>>> SV >>>> SPSMTVASSQ >>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP >>>> LA >>>> GTAPGAEGPA >>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK >>>> AF >>>> RRKEHLRRHR >>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL >>>> RH >>>> QRIHGRAAAS >>>> AQGAVAPGPDGGGPFPPWPLG >>>> >>>> it seems like i'm just one bloody step away from success. ^ ^* >>>> can't figure >>>> out the prob. >>>> thanks for your help. >>>> >>>> >>>> Torsten Seemann wrote: >>>>> >>>>> Dorjee, >>>>> >>>>>> thanks alot for your reply again. as per your suggestion (using >>>>>> 'die >>>>>> "could >>>>>> not get seq" if not defined $queryin;'), i now get the following >>>>>> error >>>>>> message: >>>>>> Software error: >>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl >>>>>> line 50. >>>>>> i've attached the script. could you plz have a look at it and see >>>>>> where >>>>>> am i >>>>>> going wrong. >>>>>> cheers mate! >>>>> >>>>> This strongly suggests that your FASTA file is not actually in >>>>> FASTA >>>>> format. >>>>> http://en.wikipedia.org/wiki/Fasta_format >>>>> >>>>> Does it work if you pass it to blastall on the command line? >>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ >>>>> database/nr >>>>> >>>>>> Saier Lab. >>>>>> 858-534-2457 >>>>> >>>>> Are you working at UCSD? >>>>> >>>>> --Torsten >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/blastall- >>>> problem-tf3527412.html#a9867402 >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> Miller Research Fellow >>> University of California, Berkeley >>> lab: 510.642.8441 >>> http://pmb.berkeley.edu/~taylor/people/js.html >>> http://fungalgenomes.org/ >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> View this message in context: http://www.nabble.com/blastall- >> problem-tf3527412.html#a9875685 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9879110 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gilbertd at cricket.bio.indiana.edu Fri Apr 6 23:31:29 2007 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Fri, 6 Apr 2007 22:31:29 -0500 (EST) Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths Message-ID: <200704070331.l373VTI22000@cricket.bio.indiana.edu> Dear Bioperlers, There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta files have fixed line widths, but that isn't a requirement of Fasta format. The documentation notes this package requirement, but I was bitten by this, and I'd guess not many people check their data (esp. if from someone else) to see it meets this requirement. Simple tools can easily produce fasta with ragged line formatting: e.g. genome assemblers that paste together contig fasta with spacers to make assemblies. It would be nice if B:D:Fasta would check and die when it can't handle this ragged input. Here is a suggested addition: package Bio::DB::Fasta; =head1 DESCRIPTION Entries may have any line length up to 65,536 characters, and different line lengths are allowed in the same file. However, within a sequence entry, all lines must be the same length except for the last. + An error will be thrown if this is not the case. =cut use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want sub calculate_offsets { my ($offset,$id,$linelength,$type,$firstline,$count,$termination_length,%offsets); + my ($l3_len,$l2_len,$l_len)=(0,0,0); $self->_check_linelength($linelength); + ($l3_len,$l2_len,$l_len)=(0,0,0); } else { + $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to check every line :( + if(DIE_ON_MISSMATCHED_LINES && + $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) { + my $fap= substr($_,0,20).".."; + $self->throw("Each line of the fasta entry must be the same length except the last. + Line above #$. '$fap' is $l2_len != $l3_len chars."); + } $linelength ||= length($_); -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ From hlapp at gmx.net Sat Apr 7 12:42:13 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 7 Apr 2007 12:42:13 -0400 Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu> References: <200704070331.l373VTI22000@cricket.bio.indiana.edu> Message-ID: <05D43C56-8B30-41C9-8C35-2CD77419DE7F@gmx.net> Wouldn't it be easier (and more robust) to just reformat the file to meet the constant line width requirement? The code required to do that should be fewer lines than your addition below, I think. For example, one could do a fast first-pass through the file simply checking that all sequence lines not followed by a description line or eof have the same length, stopping at the first line that fails the test. If unequal lengths, use Bio::SeqIO to read and write back out the fasta file, then continue as for well-formatted files. -hilmar On Apr 6, 2007, at 11:31 PM, Don Gilbert wrote: > > Dear Bioperlers, > > There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta > files have fixed line widths, but that isn't a requirement of Fasta > format. The documentation notes this package requirement, but I was > bitten by this, and I'd guess not many people check their data (esp. > if from someone else) to see it meets this requirement. > > Simple tools can easily produce fasta with ragged line formatting: > e.g. genome assemblers that paste together contig fasta with spacers > to make assemblies. > > It would be nice if B:D:Fasta would check and die when it can't handle > this ragged input. Here is a suggested addition: > > package Bio::DB::Fasta; > > =head1 DESCRIPTION > > Entries may have any line length up to 65,536 characters, and > different line lengths are allowed in the same file. However, > within > a sequence entry, all lines must be the same length except for the > last. > + An error will be thrown if this is not the case. > > =cut > > use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want > > sub calculate_offsets { > > my ($offset,$id,$linelength,$type,$firstline,$count, > $termination_length,%offsets); > + my ($l3_len,$l2_len,$l_len)=(0,0,0); > > $self->_check_linelength($linelength); > + ($l3_len,$l2_len,$l_len)=(0,0,0); > } else { > + $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # > need to check every line :( > + if(DIE_ON_MISSMATCHED_LINES && > + $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) { > + my $fap= substr($_,0,20).".."; > + $self->throw("Each line of the fasta entry must be the > same length except the last. > + Line above #$. '$fap' is $l2_len != $l3_len chars."); > + } > > $linelength ||= length($_); > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 7 17:13:24 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 7 Apr 2007 17:13:24 -0400 Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths In-Reply-To: <200704071711.l37HBB823983@cricket.bio.indiana.edu> References: <200704071711.l37HBB823983@cricket.bio.indiana.edu> Message-ID: <8177CF47-558F-4891-97B5-69F327EF8A4A@gmx.net> What I was suggesting was the indexer automatically does the reformatting, i.e., to have touch/change the input data if necessary (and obviously one would be able to turn this feature off when the correctness of the input formatting is known). Are you suggesting that this automatic reformatting isn't possible? -hilmar On Apr 7, 2007, at 1:11 PM, Don Gilbert wrote: > > > Hilmar, > > I have added reformatting where appropriate (in code that installs the > files for indexing by Bio::DB::Fasta). What I'm suggesting is a patch > to Bio::DB::Fasta to warn and die when the documented fixed width > that Bio::DB::Fasta requires isn't met. I.e., keep other folks from > being bitten by this hard to identify requirement. Then when they > see that this indexer is failing on inappropriate inputs, they also > can reformat > their Fasta to meet this requirement, and not continue to use the > software with > bad results. The operation of Bio::DB::Fasta is reading a sequence > stream > and it doesn't touch/change the input data, so it would be hard to > patch it > to re-format the input data. > > - Don > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 7 21:00:51 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 7 Apr 2007 21:00:51 -0400 Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths In-Reply-To: <200704080006.l3806Yt25235@cricket.bio.indiana.edu> References: <200704080006.l3806Yt25235@cricket.bio.indiana.edu> Message-ID: Since you'd have to reformat it though, how would you do it then (presumably offline)? -hilmar On Apr 7, 2007, at 8:06 PM, Don Gilbert wrote: > > > Hilmar, > > Yes, basically automatic reformatting isn't possible. If you are > indexing a large genome of fasta data, I'd not want a bioperl script > to rewrite that data, or create a new version, automatically. > > - Don -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gilbertd at cricket.bio.indiana.edu Sat Apr 7 13:11:11 2007 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Sat, 7 Apr 2007 12:11:11 -0500 (EST) Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths Message-ID: <200704071711.l37HBB823983@cricket.bio.indiana.edu> Hilmar, I have added reformatting where appropriate (in code that installs the files for indexing by Bio::DB::Fasta). What I'm suggesting is a patch to Bio::DB::Fasta to warn and die when the documented fixed width that Bio::DB::Fasta requires isn't met. I.e., keep other folks from being bitten by this hard to identify requirement. Then when they see that this indexer is failing on inappropriate inputs, they also can reformat their Fasta to meet this requirement, and not continue to use the software with bad results. The operation of Bio::DB::Fasta is reading a sequence stream and it doesn't touch/change the input data, so it would be hard to patch it to re-format the input data. - Don -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ From gilbertd at cricket.bio.indiana.edu Sat Apr 7 20:06:34 2007 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Sat, 7 Apr 2007 19:06:34 -0500 (EST) Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths Message-ID: <200704080006.l3806Yt25235@cricket.bio.indiana.edu> Hilmar, Yes, basically automatic reformatting isn't possible. If you are indexing a large genome of fasta data, I'd not want a bioperl script to rewrite that data, or create a new version, automatically. - Don From gdorjee at hotmail.com Mon Apr 9 00:18:39 2007 From: gdorjee at hotmail.com (DeeGee) Date: Sun, 8 Apr 2007 21:18:39 -0700 (PDT) Subject: [Bioperl-l] parse blast report for the best evalue Message-ID: <9898358.post@talk.nabble.com> hi all, i'm trying to parse a blast report using Bio::SearchIO as follows, but since this blast report is generated with many against many (database) fasta sequences, there're many individual blast reports (one for each of the sequence from the query file). i was wondering if there is a way to get only the best hit (with best evalue) from each one of them. ##### part of my script ###### my $in = new Bio::SearchIO(-format => 'blast', -file => $blast_report); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { ........... thanks. -- View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9898358 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From staffa at niehs.nih.gov Mon Apr 9 11:43:19 2007 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Mon, 09 Apr 2007 11:43:19 -0400 Subject: [Bioperl-l] Retrieve mRNA from Genome Message-ID: I have been retrieving sub-sequence from Genbank genomic records by use of Bio::SeqIO and ->get_SeqFeatures, ->start ->end , but now I'm looking for a quick way to extract CDS or mRNA from a multi-segmented annotation, e.g. mRNA join(72458..72791,84573..84613,93279..94419,94481..94656, 94719..94992,95056..95350,95438..95553,95614..96056) Is there such a method? Please point me to appropriate documentation. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Informati