From dmessina at wustl.edu Sun Apr 1 22:54:58 2007 From: dmessina at wustl.edu (David Messina) Date: Sun, 1 Apr 2007 21:54:58 -0500 Subject: [Bioperl-l] installation bioperl Message-ID: <6EFFF13A-66E7-418F-8B8E-A8AA8826DE83@wustl.edu> We need more information to be able to help you. Could you please show us the actual output you see when trying to install Bioperl? Also, we need to know: - what operating system you have - what version of Bioperl you are trying to install See http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance and please read the rest of the document, too. Dave From aharry2001 at yahoo.com Mon Apr 2 06:09:25 2007 From: aharry2001 at yahoo.com (Ambrose) Date: Mon, 2 Apr 2007 03:09:25 -0700 (PDT) Subject: [Bioperl-l] bioperl and kegg(out of memory problem ) In-Reply-To: Message-ID: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com> Hello All, I have some problems parsing KEGG using bioperl. I get out of memory problem.I current have 1G RAM.Can some tell me why this is happening and how it can be solved.It is beacuse the objects passed to bioiperl are so big or what? best regrads Ambrose --------------------------------- TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV. From cjfields at uiuc.edu Mon Apr 2 08:43:18 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Apr 2007 07:43:18 -0500 Subject: [Bioperl-l] bioperl and kegg(out of memory problem ) In-Reply-To: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com> References: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com> Message-ID: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu> This doesn't really explain much beyond stating you are having problems. You need to post some code (to the mail list!) and let us know what version of BioPerl you are using. chris On Apr 2, 2007, at 5:09 AM, Ambrose wrote: > Hello All, > I have some problems parsing KEGG using bioperl. I get > out of memory problem.I current have 1G RAM.Can some tell me why > this is happening and how it can be solved.It is beacuse the > objects passed to bioiperl are so big or what? > > best regrads > Ambrose > > > --------------------------------- > TV dinner still cooling? > Check out "Tonight's Picks" on Yahoo! TV. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From aharry2001 at yahoo.com Mon Apr 2 09:56:33 2007 From: aharry2001 at yahoo.com (Ambrose) Date: Mon, 2 Apr 2007 06:56:33 -0700 (PDT) Subject: [Bioperl-l] bioperl and kegg(out of memory problem ) In-Reply-To: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu> Message-ID: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com> Hello ALL, I have the code below,which parses my kegg files.A host of the files are parsed and the information is inserted into my databases but unfortunate after the program runs for some hours it stops showing the message out of memory.I assume that this happens because the bioperl object is too big.Please just check the code below best regards Ambrose #!/usr/local/ActivePerl/bin/perl # # use strict; use Bio::SeqIO; use Bio::FASTASequence; use DBI; use Benchmark qw(:all) ; my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1,$pathway_id1,$pathway_name1,$ec_num); my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,%dblink_NCBIGENEID,%dblink_UniProt); my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq); my( @kg_id); my $db="gbdb"; my $host="localhost"; my $userid="root"; my $passwd="ubuntu"; my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/mysqld.sock"; my ($t1,$t2); my $dbh = DBI->connect($connectionInfo,$userid,$passwd); my $time_used; eval { $dbh->do("DROP TABLE kegginfo") }; print "Dropping kegginfo failed: $@\n" if $@; $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL AUTO_INCREMENT, up_id INT UNSIGNED REFERENCES uniprotinfo(up_id), filename VARCHAR(50), kegg_id VARCHAR(50), keggaccn VARCHAR(50), description VARCHAR(250), ec_numbers VARCHAR(250), pathway_id VARCHAR(250), pathway_name VARCHAR(250), crc64 VARCHAR(50), ko_id VARCHAR(50), pfam_id VARCHAR(50), ncbigi_id VARCHAR(50), ncbigeneid_id VARCHAR(50), uniprot_id VARCHAR(50), prosite_id VARCHAR(50), PRIMARY KEY (kg_id) )"); eval { $dbh->do("DROP TABLE keggntsequence") }; print "Dropping keggntsequence failed: $@\n" if $@; $dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id), keggaccn VARCHAR(50), nucleotidesequence text )"); eval { $dbh->do("DROP TABLE keggaasequence") }; print "Dropping keggaasequence failed: $@\n" if $@; $dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id), keggaccn VARCHAR(50), crc64 VARCHAR(50), aminoacidsequence text )"); eval { $dbh->do("DROP TABLE timestable") }; print "Dropping timestable failed: $@\n" if $@; $dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT NULL AUTO_INCREMENT, genome VARCHAR(100), totaltime_seconds int(100), PRIMARY KEY(aut_id))"); open (LIST, "genomes.list") || die "Cannot open input kegg genomes file genomes.list\n $! \n"; $t1=new Benchmark; my @genomelist = (); while (my $line=) { #ignore comment lines if ($line !~ /^#/) { chomp $line; push (@genomelist, $line); #store the filename } } close LIST; my $count=0; foreach my $genomefile (@genomelist) { #in case the user fails to remove some strange files from #the genomes.list file.. check for the KEGG format my $check=checkKeggFormat($genomefile); if ($check==0) { #if file is not kegg, start with the next one... print "ERROR: $genomefile doesn't look like a KEGG file to me! \n"; #; next; } #print $genomefile,"\n"; my $stream = Bio::SeqIO->new(-file => $genomefile, -format => 'KEGG'); while ( my $seq = $stream->next_seq() ) { my $primary_id = $seq->primary_id; my $display_id = $seq->display_id; #name my $keggaccn = $seq->accession; #accn my @description = $seq->annotation->get_Annotations('description'); my @dblinks = $seq->annotation->get_Annotations('dblink'); my @orthologs = $seq->annotation->get_Annotations('ortholog'); my @orthologs = grep {$_->database eq 'KO'} $seq->annotation->get_Annotations('dblink'); my @class = $seq->annotation->get_Annotations('pathway'); $ntseq{$keggaccn} = $seq->seq; $aaseq{$keggaccn} = $seq->translate->seq; $aaseq{$keggaccn} =~s /\*$//; my $fasta = ">".$count."\n".$aaseq{$keggaccn}; my $newseq = Bio::FASTASequence->new($fasta); $crc64{$keggaccn}=$newseq->getCrc64(); #print $keggaccn,"crc64:$crc64{$keggaccn}\n"; $count++; if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no keggaccn\n"; next;} if(@dblinks) { my @dblink_KO=(); my @dblink_Pfam=(); my @dblink_PROSITE=(); my @dblink_NCBIGI=(); my @dblink_NCBIGENEID=(); my @dblink_UniProt=(); foreach my $ele (@dblinks) { if ($ele =~ /^KO:/){ $ele=~s/KO://; push (@dblink_KO,$ele); $dblink_KO{$keggaccn}=$ele; next; } #parse Pfam: dblink if ($ele =~ /^Pfam:/){ $ele=~s/Pfam://; push (@dblink_Pfam,$ele); $dblink_Pfam{$keggaccn}=$ele; next; } #parse PROSITE: dblink if ($ele =~ /^PROSITE:/){ $ele=~s/PROSITE://; push (@dblink_PROSITE,$ele); $dblink_PROSITE{$keggaccn}=$ele; next; } #parse NCBI-GI: dblink if ($ele =~ /^NCBI-GI:/){ $ele=~s/NCBI-GI://; push (@dblink_NCBIGI,$ele); $dblink_NCBIGI{$keggaccn}=$ele; next; } #parse NCBI-GeneID: dblink if ($ele =~ /^NCBI-GeneID:/){ $ele=~s/NCBI-GeneID://; push (@dblink_NCBIGENEID,$ele); $dblink_NCBIGENEID{$keggaccn}=$ele; next; } #parse UniProt: dblink if ($ele =~ /^UniProt:/){ $ele=~s/UniProt://; push (@dblink_UniProt,$ele); $dblink_UniProt{$keggaccn}=$ele; next; } }#end foreach #finished parsing all dblinks }#end if @dblinks if(@class) { foreach my $pathway (@class) { $pathway=~s/^\s+|\s+$//; my @arr = split (/\s+/,$pathway); my $pathway_id = $arr[0]; shift @arr; my $pathway_name = join(" ", at arr); $pathway_name{$keggaccn}=$pathway_name; $pathway_id{$keggaccn}=$pathway_id; #print $pathway_id{$keggaccn},"\t",$pathway_name{$keggaccn},"\n"; } } my @ecnumbers=(); @ecnumbers = extractECnumbers(@description); if(@ecnumbers) { if (@ecnumbers!=0) { foreach my $ecn (@ecnumbers) { $ecnumbers{$keggaccn}=$ecn; }#end foreach } else { #print "ECnumbers:\n"; } } # print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t",$dblink_NCBIGENEID{$keggaccn}, # "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers{$keggaccn}","\t", # "p1:$pathway_id{$keggaccn}","\t","p2:$pathway_name{$keggaccn}","\n"; # $dbh->do("INSERT INTO kegginfo VALUES (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)", undef,"NULL","NULL",$genomefile,$display_id,$keggaccn, at description,$ecnumbers{$keggaccn}, $pathway_id{$keggaccn},$pathway_name{$keggaccn},$crc64{$keggaccn},$dblink_KO{$keggaccn}, $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn},$dblink_NCBIGENEID{$keggaccn}, $dblink_UniProt{$keggaccn},$dblink_PROSITE{$keggaccn}); $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)", undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn}); $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)", undef,"",$keggaccn,$ntseq{$keggaccn}); } $t2=new Benchmark; $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile"); $dbh->do("INSERT INTO timestable VALUES (?,?,?)", undef,"NULL",$genomefile,$time_used); } $dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)"); print "Index created on kegginfo\n"; $dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence (kg_id,keggaccn)"); print "Index created on keggaasequence\n"; $dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence (kg_id,keggaccn)"); print "Index created on keggntsequence\n"; print"Updating the tables................\n"; $dbh->do("update kegginfo,keggaasequence set keggaasequence.kg_id=kegginfo.kg_id where kegginfo.keggaccn=keggaasequence.keggaccn"); print " keggaasequence kg_id\n"; $dbh->do("update kegginfo,keggntsequence set keggntsequence.kg_id=kegginfo.kg_id where kegginfo.keggaccn=keggntsequence.keggaccn"); print " keggaasequence kg_id\n"; sub extractECnumbers ($) { #sample description lines #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 2.7.7.2] #ATP synthase F0 subunit c [EC:3.6.3.14] my @desc=shift; my $description = join ("", at desc); my @ecnumbers=(); #print "parsing ec for $description..\n"; #check if EC number exists if ($description=~/\[EC:/) { my @array = split (/\[EC:/,$description); $array[1]=~s/]//g; shift @array; #remove the annotation , only EC numbers remain foreach my $ele (@array) { $ele=~s/^\s+|\s+$//g; $ele= "EC:".$ele; push (@ecnumbers,$ele); } return @ecnumbers; } else { #return an empty value return ; } } sub checkKeggFormat ($) { =head2 checkKeggFormat make sure that the file is a valid KEGG file function checks the first two lines, 1st must start with ENTRY 2nd must start with DEFINITION returns 0 or 1 =cut my $genomefile=shift; open (TEST,$genomefile) || die "Cannot open file $genomefile for reading \n"; my $testline=; #print "$testline\n"; if ($testline=~/^ENTRY/) { #continue #$testline=;#double check #if ($testline=~/^NAME/) { #this looks like a valid kegg file return 1; #} #else { # close TEST; # return 0; #} } else { close TEST; return 0; } } sub timeThis ($$$) { my ($start,$end,$message) = @_; my $td = timediff($end, $start); my $t = timestr($td); print "$message : ",$t,"\n"; my @array = split (/\s+/,$t); #20 wallclock secs (14.23 usr + 0.84 sys = 15.07 CPU) return $array[0]; #return the no. of seconds. } --------------------------------- Looking for earth-friendly autos? Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. From e-just at northwestern.edu Mon Apr 2 10:12:33 2007 From: e-just at northwestern.edu (Eric Just) Date: Mon, 2 Apr 2007 09:12:33 -0500 Subject: [Bioperl-l] Can't locate object method "seq_start" via package "Bio::DB::GenBank" Message-ID: Hello, I am getting this error while running a bioperl script that I had been using in bioperl 1.4. On upgradeing to bioperl 1.5.2 I get the following fatal error Can't locate object method "seq_start" via package "Bio::DB::GenBank" My script is as follows: use Bio::DB::GenBank; use Bio::DB::Query::GenBank; my $gb = new Bio::DB::GenBank(); my $query = Bio::DB::Query::GenBank->new( -query =>'txid44689[Organism:noexp]', -reldate => 60, -db => 'nucleotide' ); my $in = $gb->get_Stream_by_query($query); while ( my $seq = $in->next_seq()) { print "do something"; #.... } I noticed that seq_start is created in the begin block of Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have expericence troubleshooting this kind of autoloaded method. Any idea where to start? Thanks Eric From e-just at northwestern.edu Mon Apr 2 10:15:28 2007 From: e-just at northwestern.edu (Eric Just) Date: Mon, 2 Apr 2007 09:15:28 -0500 Subject: [Bioperl-l] Can't locate object method "seq_start" via package "Bio::DB::GenBank" In-Reply-To: References: Message-ID: Sorry about that. As soon as I sent the email I found my problem ( an old NCBIHelper in my inheritance path ). There is no bug here. Eric On 4/2/07, Eric Just wrote: > > Hello, > > I am getting this error while running a bioperl script that I had been > using in bioperl 1.4. On upgradeing to bioperl 1.5.2 I get the following > fatal error > > Can't locate object method "seq_start" via package "Bio::DB::GenBank" > > My script is as follows: > > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > my $gb = new Bio::DB::GenBank(); > > my $query = Bio::DB::Query::GenBank->new( > -query =>'txid44689[Organism:noexp]', > -reldate => 60, > -db => 'nucleotide' > > ); > > my $in = $gb->get_Stream_by_query($query); > > while ( my $seq = $in->next_seq()) { > print "do something"; > #.... > } > > > > I noticed that seq_start is created in the begin block of > Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have > expericence troubleshooting this kind of autoloaded method. Any idea where > to start? > > Thanks > > Eric > From cjfields at uiuc.edu Mon Apr 2 11:32:59 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Apr 2007 10:32:59 -0500 Subject: [Bioperl-l] bioperl and kegg(out of memory problem ) In-Reply-To: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com> References: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com> Message-ID: <38475C93-FB21-4BC4-BF5D-7F48493E8EE2@uiuc.edu> Ambrose, Data is persisting in your hashes (in particular DBLink objects), which is eating away at your memory. If I take a sample KEGG gene file and simply parse it: while (my $seq = $io->next_seq) { print $seq->accession,"\n"; } there are no memory issues, but if I store the data in hashes declared outside the loop: my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% dblink_NCBIGENEID,%dblink_UniProt); my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq); while (my $seq = $io->next_seq) { # store Bio::Seq data in hashes } I see problems with only one genome file with KEGG records. You'll definitely run into memory issues if you are parsing many genome files, which you appear to be: my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% dblink_NCBIGENEID,%dblink_UniProt); my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq); for my $genomefile (@genomelist) { while (my $seq = $io->next_seq) { # store Bio::Seq data in hashes } } Localizing the hashes to the genome or sequence loops should prevent the memory problem. Note that the DBLink Annotation objects are overloaded so they act like a string ($ele =~ /^KO:/) but are actually Bio::Annotation::DBLink objects, something we will likely get rid of in the near future. chris On Apr 2, 2007, at 8:56 AM, Ambrose wrote: > > > Hello ALL, > > I have the code below,which parses my kegg files.A host of the > files are parsed and the information is inserted into my databases > but unfortunate after the program runs for some hours it stops > showing the message out of memory.I assume that this happens > because the bioperl object is too big.Please just check the code below > > best regards Ambrose > > > #!/usr/local/ActivePerl/bin/perl > # > # > > use strict; > use Bio::SeqIO; > use Bio::FASTASequence; > use DBI; > use Benchmark qw(:all) ; > > my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1, > $pathway_id1,$pathway_name1,$ec_num); > my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% > dblink_NCBIGENEID,%dblink_UniProt); > my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq); > my( @kg_id); > my $db="gbdb"; > my $host="localhost"; > my $userid="root"; > my $passwd="ubuntu"; > my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/ > mysqld.sock"; > my ($t1,$t2); > my $dbh = DBI->connect($connectionInfo,$userid,$passwd); > my $time_used; > > > > eval { $dbh->do("DROP TABLE kegginfo") }; > print "Dropping kegginfo failed: $@\n" if $@; > $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL > AUTO_INCREMENT, > up_id INT UNSIGNED REFERENCES > uniprotinfo(up_id), > > filename VARCHAR(50), > kegg_id VARCHAR > (50), > keggaccn VARCHAR(50), > > description VARCHAR(250), > ec_numbers VARCHAR(250), > pathway_id VARCHAR(250), > pathway_name VARCHAR > (250), > crc64 VARCHAR(50), > ko_id VARCHAR(50), > pfam_id VARCHAR(50), > ncbigi_id VARCHAR(50), > ncbigeneid_id VARCHAR(50), > uniprot_id VARCHAR(50), > prosite_id VARCHAR(50), > PRIMARY KEY (kg_id) > )"); > > > eval { $dbh->do("DROP TABLE keggntsequence") }; > print "Dropping keggntsequence failed: $@\n" if $@; > $dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED > REFERENCES uniprotinfo(kg_id), > keggaccn VARCHAR > (50), > nucleotidesequence text > )"); > > eval { $dbh->do("DROP TABLE keggaasequence") }; > print "Dropping keggaasequence failed: $@\n" if $@; > $dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED > REFERENCES uniprotinfo(kg_id), > keggaccn VARCHAR > (50), > crc64 VARCHAR(50), > aminoacidsequence text > )"); > eval { $dbh->do("DROP TABLE timestable") }; > print "Dropping timestable failed: $@\n" if $@; > $dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT > NULL AUTO_INCREMENT, > genome VARCHAR(100), > totaltime_seconds int(100), > > PRIMARY KEY(aut_id))"); > > > > open (LIST, "genomes.list") || die "Cannot open input kegg genomes > file genomes.list\n $! \n"; > $t1=new Benchmark; > my @genomelist = (); > while (my $line=) { > #ignore comment lines > if ($line !~ /^#/) { > chomp $line; > > push (@genomelist, $line); #store the filename > } > } > > close LIST; > my $count=0; > foreach my $genomefile (@genomelist) { > > #in case the user fails to remove some strange files from > #the genomes.list file.. check for the KEGG format > my $check=checkKeggFormat($genomefile); > if ($check==0) { > #if file is not kegg, start with the next one... > print "ERROR: $genomefile doesn't look like a KEGG file to > me! \n"; > #; > next; > } > #print $genomefile,"\n"; > my $stream = Bio::SeqIO->new(-file => $genomefile, -format => > 'KEGG'); > > while ( my $seq = $stream->next_seq() ) { > > my $primary_id = $seq->primary_id; > my $display_id = $seq->display_id; #name > my $keggaccn = $seq->accession; #accn > my @description = $seq->annotation->get_Annotations > ('description'); > > my @dblinks = $seq->annotation->get_Annotations('dblink'); > my @orthologs = $seq->annotation->get_Annotations > ('ortholog'); > my @orthologs = grep {$_->database eq 'KO'} $seq- > >annotation->get_Annotations('dblink'); > my @class = $seq->annotation->get_Annotations > ('pathway'); > $ntseq{$keggaccn} = $seq->seq; > $aaseq{$keggaccn} = $seq->translate->seq; > $aaseq{$keggaccn} =~s /\*$//; > my $fasta = ">".$count."\n".$aaseq{$keggaccn}; > my $newseq = Bio::FASTASequence->new($fasta); > $crc64{$keggaccn}=$newseq->getCrc64(); > #print $keggaccn,"crc64:$crc64{$keggaccn}\n"; > > $count++; > if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no > keggaccn\n"; > next;} > > if(@dblinks) > { > my @dblink_KO=(); > my @dblink_Pfam=(); > my @dblink_PROSITE=(); > my @dblink_NCBIGI=(); > my @dblink_NCBIGENEID=(); > my @dblink_UniProt=(); > > foreach my $ele (@dblinks) { > if ($ele =~ /^KO:/){ > $ele=~s/KO://; > push (@dblink_KO,$ele); > $dblink_KO{$keggaccn}=$ele; > next; > } > #parse Pfam: dblink > if ($ele =~ /^Pfam:/){ > $ele=~s/Pfam://; > push (@dblink_Pfam,$ele); > $dblink_Pfam{$keggaccn}=$ele; > next; > } > #parse PROSITE: dblink > if ($ele =~ /^PROSITE:/){ > $ele=~s/PROSITE://; > push (@dblink_PROSITE,$ele); > $dblink_PROSITE{$keggaccn}=$ele; > next; > } > #parse NCBI-GI: dblink > if ($ele =~ /^NCBI-GI:/){ > $ele=~s/NCBI-GI://; > push (@dblink_NCBIGI,$ele); > $dblink_NCBIGI{$keggaccn}=$ele; > next; > } > #parse NCBI-GeneID: dblink > if ($ele =~ /^NCBI-GeneID:/){ > $ele=~s/NCBI-GeneID://; > push (@dblink_NCBIGENEID,$ele); > $dblink_NCBIGENEID{$keggaccn}=$ele; > next; > } > #parse UniProt: dblink > if ($ele =~ /^UniProt:/){ > $ele=~s/UniProt://; > push (@dblink_UniProt,$ele); > $dblink_UniProt{$keggaccn}=$ele; > next; > } > > }#end foreach #finished parsing all dblinks > }#end if @dblinks > if(@class) > { > foreach my $pathway (@class) { > > $pathway=~s/^\s+|\s+$//; > my @arr = split (/\s+/,$pathway); > my $pathway_id = $arr[0]; > shift @arr; > my $pathway_name = join(" ", at arr); > $pathway_name{$keggaccn}=$pathway_name; > $pathway_id{$keggaccn}=$pathway_id; > #print $pathway_id{$keggaccn},"\t",$pathway_name > {$keggaccn},"\n"; > > } > > } > > my @ecnumbers=(); > @ecnumbers = extractECnumbers(@description); > if(@ecnumbers) > { > if (@ecnumbers!=0) > { > foreach my $ecn (@ecnumbers) > { > $ecnumbers{$keggaccn}=$ecn; > }#end foreach > } > else { > #print "ECnumbers:\n"; > } > } > > > # print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t", > $dblink_NCBIGENEID{$keggaccn}, > # "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers > {$keggaccn}","\t", > # "p1:$pathway_id{$keggaccn}","\t","p2: > $pathway_name{$keggaccn}","\n"; > # > $dbh->do("INSERT INTO kegginfo VALUES > (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)", > undef,"NULL","NULL",$genomefile,$display_id, > $keggaccn, at description,$ecnumbers{$keggaccn}, > $pathway_id{$keggaccn},$pathway_name{$keggaccn}, > $crc64{$keggaccn},$dblink_KO{$keggaccn}, > $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn}, > $dblink_NCBIGENEID{$keggaccn}, > $dblink_UniProt{$keggaccn},$dblink_PROSITE > {$keggaccn}); > > > $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)", > undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn}); > > > $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)", > undef,"",$keggaccn,$ntseq{$keggaccn}); > > > } > $t2=new Benchmark; > $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile"); > $dbh->do("INSERT INTO timestable VALUES (?,?,?)", > undef,"NULL",$genomefile,$time_used); > > } > > > $dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)"); > print "Index created on kegginfo\n"; > > $dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence > (kg_id,keggaccn)"); > print "Index created on keggaasequence\n"; > > $dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence > (kg_id,keggaccn)"); > print "Index created on keggntsequence\n"; > > > print"Updating the tables................\n"; > > > $dbh->do("update kegginfo,keggaasequence set > keggaasequence.kg_id=kegginfo.kg_id > where > kegginfo.keggaccn=keggaasequence.keggaccn"); > print " keggaasequence kg_id\n"; > > $dbh->do("update kegginfo,keggntsequence set > keggntsequence.kg_id=kegginfo.kg_id > where > kegginfo.keggaccn=keggntsequence.keggaccn"); > print " keggaasequence kg_id\n"; > > > > sub extractECnumbers ($) { > #sample description lines > #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 > 2.7.7.2] > #ATP synthase F0 subunit c [EC:3.6.3.14] > > my @desc=shift; > my $description = join ("", at desc); > my @ecnumbers=(); > #print "parsing ec for $description..\n"; > #check if EC number exists > if ($description=~/\[EC:/) { > > my @array = split (/\[EC:/,$description); > $array[1]=~s/]//g; > shift @array; #remove the annotation , only EC numbers remain > foreach my $ele (@array) { > $ele=~s/^\s+|\s+$//g; > $ele= "EC:".$ele; > push (@ecnumbers,$ele); > } > return @ecnumbers; > } > else { > #return an empty value > return ; > > } > > } > > > > > > > > sub checkKeggFormat ($) { > =head2 > > checkKeggFormat > > make sure that the file is a valid KEGG file > function checks the first two lines, > 1st must start with ENTRY > 2nd must start with DEFINITION > > returns 0 or 1 > > =cut > my $genomefile=shift; > > open (TEST,$genomefile) || die "Cannot open file $genomefile > for reading \n"; > my $testline=; > #print "$testline\n"; > if ($testline=~/^ENTRY/) { > #continue > #$testline=;#double check > #if ($testline=~/^NAME/) { > #this looks like a valid kegg file > return 1; > #} > #else { > # close TEST; > # return 0; > #} > } > else { > close TEST; > return 0; > } > > } > > sub timeThis ($$$) > { > my ($start,$end,$message) = @_; > my $td = timediff($end, $start); > my $t = timestr($td); > print "$message : ",$t,"\n"; > my @array = split (/\s+/,$t); > #20 wallclock secs (14.23 usr + 0.84 sys = 15.07 CPU) > return $array[0]; #return the no. of seconds. > } > > > > > --------------------------------- > Looking for earth-friendly autos? > Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Mon Apr 2 12:19:51 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 2 Apr 2007 11:19:51 -0500 Subject: [Bioperl-l] installation bioperl Message-ID: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu> Hi Fahmi, Please include the list on the reply so that others can comment, too. Yes, it appears the machine you are installing on does not have an internet connection. You probably will want to resolve that problem before dealing with Bioperl. Alternatively, you could simply install and use Bioperl on the machine which does have an internet connection. If you really need to get Bioperl installed on that machine, however, probably the easiest way would be to find a machine that does have an internet connection, install CPAN::Mini, and use it to make a local mirror of CPAN. You could then copy that local mirror over to the machine without the internet connection and point that machine's cpan at the local mirror (read the CPAN documentation to find out how to do this). Also, the BioPerl install instructions list several external packages that you will need to use some parts of Bioperl (e.g. GD). Again, you can download those distributions using the machine with the internet connection and copy them over. Dave On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote: > thank you for answer. I will give you the maximum of informations > inorder to be able to diagnostic the problem: > > i use the linux mandriva 2006 > i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained > from the url: > http://www.bioperl.org/wiki/Release_1.5.2 > afetr that i made these commands which i found in the url > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph > INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ') > > >gunzip bioperl-1.5.2_102.tar.gz > >tar xvf bioperl-1.5.2_102.tar > >cd bioperl-1.5.2_102 > after that i made the command > >perl Build.PL > i obtained the text > this package requires Module::Build v0.2805 or greater to install > itself > install Module::Build now from CPAN?[y] > i pushed enter and i obtained many lines such as > System call"/usr/bin/wget -0-"ftp://.perl.org/pub/CPAN/modules/ > modlist.data.gz">home/fahmi/.cpan/sources/modules/03modlist.data > Not connected > cant access URL ftp://ftp.perl.org/CPAN/modules/modlist.data.gz > ... > i'm trying to install bioperl whithout having internet connection > beacause i don't know whay linux didn't detect my ethernet card. > please tell me what should i do. > tahnk you for your collaboration. From cjfields at uiuc.edu Mon Apr 2 14:10:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Apr 2007 13:10:30 -0500 Subject: [Bioperl-l] Fwd: BLAST beta, URLAPI, and BioPerl (RemoteBlast users) References: Message-ID: <002E7937-10DF-43CE-96F6-71DC743C1314@uiuc.edu> This may be of interest to anyone using RemoteBlast. For anyone who uses RemoteBlast, the new changes to NCBI's BLAST interface shouldn't affect anything (Scott tested it out). If there are any abnormalities with RemoteBlast queries over the next few weeks let us know. chris Begin forwarded message: > From: "Mcginnis, Scott \(NIH/NLM/NCBI\) [E]" > > Date: April 2, 2007 12:53:33 PM CDT > To: "Chris Fields" > Subject: RE: BLAST beta, URLAPI, and BioPerl > > Hi Chris: > > We are ready to make the new pages the defaults come April 16th. An > announcement is going out shortly. There are some very minor > changes to the URL API and I have listed them below. IT will be > part of the announcements. Please note we actually tested BioPerl > and it seems to me fine with the new pages. If you have a news on > your site or a mailing list you might want to pass this on. > > A Note About URLAPI > > The new BLAST pages support URLAPI, a protocol that scripts and > programs use to run BLAST searches and retrieve results over > HTTP. (For more on URLAPI, see > http://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html). The following > information only applies to you if you develop or are responsible > for software that uses URLAPI. > > The new pages have been tested and produce correct results with > the following URLAPI client programs: > > * the BioPERL RemoteBlast module > * the NCBI demo script http://ncbi.nlm.nih.gov/blast/docs/web_blast.pl > * various scripts used in-house at NCBI > > Users of URLAPI should be aware of the following minor > changes. In the new interface: > > 1. The Request ID (RID) format will be shorter. The new format > is 11 alphanumeric characters (e.g. RDEFEA5012) and will have no > internal structure. The previous RID format was 36 or more > characters long, including punctuation (e.g., > 1175172712-21345-42512597310.BLASTQ3). > > 2. BLAST reports will show masked regions as lower-case letters > by default (see > http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W6, > figure 2. The current default behavior is to show masked > regions as N's or X's. Users may recover the current behavior > by adding &MASK_CHAR=0 to the query string for a URLAPI > request. > > 3. BLAST reports will show alignments for 100 database sequences > by default. The current reports show only 50 alignments by > default. > > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Mon 3/5/2007 11:50 AM > To: Mcginnis, Scott (NIH/NLM/NCBI) [E] > Subject: BLAST beta, URLAPI, and BioPerl > > The BioPerl project has several have several modules and parsers > which currently parse XML/text/tabular BLAST output, as well as a > module which is capable of posting BLAST queries via the URLAPI > interface. Will any of the BLAST changes affect these (particularly > URLAPI)? > > Thanks! > > chris > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From steletch at jouy.inra.fr Tue Apr 3 08:28:39 2007 From: steletch at jouy.inra.fr (=?ISO-8859-1?Q?St=E9phane_T=E9letch=E9a?=) Date: Tue, 03 Apr 2007 14:28:39 +0200 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: References: Message-ID: <46124877.4020605@jouy.inra.fr> Alex Lancaster a ?crit : > Hello bioperl, > > I'm new to the bioperl world, having just started a research position > in which I need to manage a large bioperl-based codebase. To this > end, I'm working on packaging bioperl as an official Fedora Package > (formerly "Fedora Extras") and I'm currently wading through and > packaging the long laundry list of Perl dependencies (then I'm going > to try and do the same for biopython). You can see my some of my > progress (including links to the reviews) here: > > http://fedoraproject.org/wiki/AlexLancaster > > Several issues have arisen during the packaging that I hope the > Nice, i was on my way to do it :-) I'm a Mandriva packager and have been kindly "spushed" for maintaining the bioperl package for Mandriva. You can have a look at the work already done by Mandriva at the addresses: http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl/current/ http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl-run/current/ (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-). Feel free to contact me if you need more input for dependencies, since they are quite a lot. Cheers, St?phane -- St?phane T?letch?a, PhD. http://www.steletch.org Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig INRA, Domaine de Vilvert T?l : (33) 134 652 891 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 From cjfields at uiuc.edu Tue Apr 3 10:58:44 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 3 Apr 2007 09:58:44 -0500 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <46124877.4020605@jouy.inra.fr> References: <46124877.4020605@jouy.inra.fr> Message-ID: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu> Once these are set up we should add a page to the bioperl wiki to describe them in more detail (along with Allen's Biopackages). chris On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote: > Alex Lancaster a ?crit : >> Hello bioperl, >> >> I'm new to the bioperl world, having just started a research position >> in which I need to manage a large bioperl-based codebase. To this >> end, I'm working on packaging bioperl as an official Fedora Package >> (formerly "Fedora Extras") and I'm currently wading through and >> packaging the long laundry list of Perl dependencies (then I'm going >> to try and do the same for biopython). You can see my some of my >> progress (including links to the reviews) here: >> >> http://fedoraproject.org/wiki/AlexLancaster >> >> Several issues have arisen during the packaging that I hope the >> > > Nice, i was on my way to do it :-) > I'm a Mandriva packager and have been kindly "spushed" for maintaining > the bioperl package for Mandriva. > > You can have a look at the work already done by Mandriva at the > addresses: > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- > bioperl/current/ > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- > bioperl-run/current/ > > (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-). > > Feel free to contact me if you need more input for dependencies, since > they are quite a lot. > > Cheers, > St?phane > > -- > St?phane T?letch?a, PhD. http://www.steletch.org > Unit? Math?matique Informatique et G?nome http:// > migale.jouy.inra.fr/mig > INRA, Domaine de Vilvert T?l : (33) 134 652 891 > 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From allenday at gmail.com Tue Apr 3 13:54:51 2007 From: allenday at gmail.com (Allen Day) Date: Tue, 3 Apr 2007 10:54:51 -0700 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu> References: <46124877.4020605@jouy.inra.fr> <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu> Message-ID: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com> You can link Biopackages now, it's been done for nearly 2 years. -Allen On 4/3/07, Chris Fields wrote: > Once these are set up we should add a page to the bioperl wiki to > describe them in more detail (along with Allen's Biopackages). > > chris > > On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote: > > > Alex Lancaster a ?crit : > >> Hello bioperl, > >> > >> I'm new to the bioperl world, having just started a research position > >> in which I need to manage a large bioperl-based codebase. To this > >> end, I'm working on packaging bioperl as an official Fedora Package > >> (formerly "Fedora Extras") and I'm currently wading through and > >> packaging the long laundry list of Perl dependencies (then I'm going > >> to try and do the same for biopython). You can see my some of my > >> progress (including links to the reviews) here: > >> > >> http://fedoraproject.org/wiki/AlexLancaster > >> > >> Several issues have arisen during the packaging that I hope the > >> > > > > Nice, i was on my way to do it :-) > > I'm a Mandriva packager and have been kindly "spushed" for maintaining > > the bioperl package for Mandriva. > > > > You can have a look at the work already done by Mandriva at the > > addresses: > > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- > > bioperl/current/ > > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- > > bioperl-run/current/ > > > > (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-). > > > > Feel free to contact me if you need more input for dependencies, since > > they are quite a lot. > > > > Cheers, > > St?phane > > > > -- > > St?phane T?letch?a, PhD. http://www.steletch.org > > Unit? Math?matique Informatique et G?nome http:// > > migale.jouy.inra.fr/mig > > INRA, Domaine de Vilvert T?l : (33) 134 652 891 > > 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Tue Apr 3 14:11:19 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 3 Apr 2007 13:11:19 -0500 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com> References: <46124877.4020605@jouy.inra.fr> <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu> <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com> Message-ID: <0802E2EB-5E94-42D2-9CE1-B82DC103A5D1@uiuc.edu> I added a small piece on Biopackages to the wiki installation page: http://www.bioperl.org/wiki/Installing_BioPerl We can move links to RPM (or similar) installations to their own page or section in the INSTALL docs when we have time. chris On Apr 3, 2007, at 12:54 PM, Allen Day wrote: > You can link Biopackages now, it's been done for nearly 2 years. > > -Allen > > On 4/3/07, Chris Fields wrote: >> Once these are set up we should add a page to the bioperl wiki to >> describe them in more detail (along with Allen's Biopackages). >> >> chris >> >> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote: >> >>> Alex Lancaster a ?crit : >>>> Hello bioperl, >>>> >>>> I'm new to the bioperl world, having just started a research >>>> position >>>> in which I need to manage a large bioperl-based codebase. To this >>>> end, I'm working on packaging bioperl as an official Fedora Package >>>> (formerly "Fedora Extras") and I'm currently wading through and >>>> packaging the long laundry list of Perl dependencies (then I'm >>>> going >>>> to try and do the same for biopython). You can see my some of my >>>> progress (including links to the reviews) here: >>>> >>>> http://fedoraproject.org/wiki/AlexLancaster >>>> >>>> Several issues have arisen during the packaging that I hope the >>>> >>> >>> Nice, i was on my way to do it :-) >>> I'm a Mandriva packager and have been kindly "spushed" for >>> maintaining >>> the bioperl package for Mandriva. >>> >>> You can have a look at the work already done by Mandriva at the >>> addresses: >>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- >>> bioperl/current/ >>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- >>> bioperl-run/current/ >>> >>> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-). >>> >>> Feel free to contact me if you need more input for dependencies, >>> since >>> they are quite a lot. >>> >>> Cheers, >>> St?phane >>> >>> -- >>> St?phane T?letch?a, PhD. http://www.steletch.org >>> Unit? Math?matique Informatique et G?nome http:// >>> migale.jouy.inra.fr/mig >>> INRA, Domaine de Vilvert T?l : (33) 134 652 891 >>> 78352 Jouy-en-Josas cedex, France Fax : (33) 134 652 901 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Tue Apr 3 18:18:56 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 03 Apr 2007 23:18:56 +0100 Subject: [Bioperl-l] Packaging bioperl for Fedora In-Reply-To: References: <1175258897.2668.21.camel@localhost.localdomain> <6d648ierkz.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com> <1p8xdeb87r.fsf@delpy.biol.berkeley.edu> <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com> <16153593-5B2A-43B4-9366-282C654E40E7@gmx.net> <5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com> Message-ID: <4612D2D0.7030202@sendu.me.uk> Chris Fields wrote: > On Mar 30, 2007, at 11:02 PM, Allen Day wrote: > >> The majority of the Bioperl classes are file parsers, or manipulate >> data that comes from the file parsers. Yes there are exceptions like >> the Eutils and Ensembl-intefacing classes, but they are the minority. >> The types of files that are worked with are generally either A) >> primary data sets such as genome data, or B) derivative data, such as >> sequence alignments that are derived from primary data using an >> algorithm. >> >> If we're in agreement that the primary data sets and >> libraries/applications for producing derivative data should not be >> present in Fedora Extras, then it follows that the Bioperl classes for >> manipulating these primary and derivative data should also not be >> present in Fedora Extras as they are of little use without data to >> manipulate. > > I respectfully disagree. Likewise, but in a slightly different way: for myself and surely many others the primary data used either isn't publicly released or isn't in some major database and therefore won't be in any kind of repository. That doesn't mean I wouldn't want the parser for my files to be somewhere convenient. From bix at sendu.me.uk Tue Apr 3 18:09:27 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 03 Apr 2007 23:09:27 +0100 Subject: [Bioperl-l] installation bioperl In-Reply-To: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu> References: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu> Message-ID: <4612D097.9060400@sendu.me.uk> > On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote: > >> thank you for answer. I will give you the maximum of informations >> inorder to be able to diagnostic the problem: >> >> i use the linux mandriva 2006 >> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained >> from the url: >> http://www.bioperl.org/wiki/Release_1.5.2 >> afetr that i made these commands which i found in the url >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph >> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ') [snip] >> i'm trying to install bioperl whithout having internet connection >> beacause i don't know whay linux didn't detect my ethernet card. >> please tell me what should i do. >> tahnk you for your collaboration. David's suggestion was a good one, but quite a lot (and possibly all you need) of BioPerl is usable just with the bioperl-1.5.2_102.tar.gz file you already have. Just follow the 'hard way' instructions: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_MODULES_THE_HARD_WAY Actually, its not that hard. Just extract the files from the .tat.gz and have your perl lib point at the resulting Bio directory. From t.r-a_ckright1 at tiscali.co.uk Wed Apr 4 08:00:12 2007 From: t.r-a_ckright1 at tiscali.co.uk (Michael Pain) Date: Wed, 4 Apr 2007 13:00:12 +0100 Subject: [Bioperl-l] Re: read it immediately Message-ID: <000501c776b0$cd5dd9b0$a7d42d54@122882420315> I have received three dics but i can not access the files as no ID or pasword was included in the package,I have paid for all this! Can you sort it out. Regards Michael Pain From thiago.venancio at gmail.com Wed Apr 4 14:14:04 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Wed, 4 Apr 2007 15:14:04 -0300 Subject: [Bioperl-l] read it immediately In-Reply-To: <000501c776b0$cd5dd9b0$a7d42d54@122882420315> References: <000501c776b0$cd5dd9b0$a7d42d54@122882420315> Message-ID: <44255ea80704041114pc284522tef2d3a3944763b90@mail.gmail.com> I think you emailed the wrong list... On 4/4/07, Michael Pain wrote: > > I have received three dics but i can not access the files as no ID or > pasword was included in the package,I have paid for all this! Can you sort > it out. > > Regards Michael Pain > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From gdorjee at hotmail.com Wed Apr 4 14:17:57 2007 From: gdorjee at hotmail.com (DeeGee) Date: Wed, 4 Apr 2007 11:17:57 -0700 (PDT) Subject: [Bioperl-l] blastall problem Message-ID: <9842643.post@talk.nabble.com> hi all, can anyone plz help me out with this problem that i've been dealing with for quite a while now. following is a part of my script that's not working for some reason. it is suppose to get the sequence from 'result/fasta.faa' and do the blast. ###my script ########### ...... my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => 'Fasta'); my $queryin = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => '/export/home/database/nr', _READMETHOD => 'Blast' ); $factory->outfile("result/out.blast"); my $blastreport = $factory->blastall($queryin); ..... when i paste the protein sequence into the textarea of my html page and save the same as 'result/fasta.faa', so that the above script would do the blast, i get the following error: Software error: ------------- EXCEPTION ------------- MSG: not Bio::Seq object or array of Bio::Seq objects or file name! STACK Bio::Tools::Run::StandAloneBlast::blastpgp /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611 STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 -------------------------------------- i would appreciate your help. i would also like to add that the 'result/fasta.faa' has the sequence saved in it. -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9842643 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gowthaman.ramasamy at sbri.org Wed Apr 4 14:57:09 2007 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Wed, 4 Apr 2007 11:57:09 -0700 Subject: [Bioperl-l] How to patch something in installed bioperl module Message-ID: Hi List, I am advised to patch (comment out some lines and add some) GFF.pm bioperl module. How do i go about it?. I have the latest Bioperl 1.5.2 version installed....via CPAN I find GFF.pm in the following location... /root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm Do i have to recompile it after editing........ I am completely clue less......I have not done this earlier..... Can any one help me to do this. Many thanks in advance........ gowthaman From dmessina at wustl.edu Wed Apr 4 15:42:43 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 4 Apr 2007 14:42:43 -0500 Subject: [Bioperl-l] blastall problem In-Reply-To: <9842643.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> Message-ID: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu> The code snippet worked fine for me. I believe the problem is that 'result/fasta.faa' is not getting passed to your code properly. You might try specifying a complete path to your input and output file -- relative paths, especially through a web app, can be tricky. > when i paste the protein sequence into the textarea of my html page > and save > the same as 'result/fasta.faa', so that the above script would do > the blast, I'm not sure from what you wrote -- did you try running your script on the command line (having created 'result/fasta.faa' manually first)? If that is working for you, then the problem is with getting the data from the webpage into the script, not with the blasting part. Dave This is what I did: % ls test.pl testp* test.pl testp.fa % formatdb -i testp.fa % ls test.pl testp* test.pl testp.fa testp.fa.phr testp.fa.pin testp.fa.psq % perl test.pl testp.fa % head -10 out.blast BLASTP 2.2.10 [Oct-19-2004] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens] (235 letters) Your code: I changed only the input filename and the input database name, and saved the script as test.pl ----------------------- #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::Tools::Run::StandAloneBlast; my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' => 'Fasta'); my $queryin = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => 'testp.fa', _READMETHOD => 'Blast' ); $factory->outfile("out.blast"); my $blastreport = $factory->blastall($queryin); ------------------------------------------------------------------------ ----------- From gdorjee at hotmail.com Wed Apr 4 17:44:27 2007 From: gdorjee at hotmail.com (DeeGee) Date: Wed, 4 Apr 2007 14:44:27 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu> References: <9842643.post@talk.nabble.com> <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu> Message-ID: <9846257.post@talk.nabble.com> Thanks for your reply Dave. I don't think that there's anything wrong with the open(OUTPUT,">result/fasta.faa"); line as I could get the 'fasta.faa' file with the sequence in it. I see it. It looks like the blast is not being able to read from the result/fasta.faa. ^ ^* Dave Messina-2 wrote: > > The code snippet worked fine for me. I believe the problem is that > 'result/fasta.faa' is not getting passed to your code properly. You > might try specifying a complete path to your input and output file -- > relative paths, especially through a web app, can be tricky. > >> when i paste the protein sequence into the textarea of my html page >> and save >> the same as 'result/fasta.faa', so that the above script would do >> the blast, > > I'm not sure from what you wrote -- did you try running your script > on the command line (having created 'result/fasta.faa' manually > first)? If that is working for you, then the problem is with getting > the data from the webpage into the script, not with the blasting part. > > Dave > > This is what I did: > > % ls test.pl testp* > test.pl testp.fa > > % formatdb -i testp.fa > > % ls test.pl testp* > test.pl testp.fa testp.fa.phr testp.fa.pin testp.fa.psq > > % perl test.pl testp.fa > % head -10 out.blast > BLASTP 2.2.10 [Oct-19-2004] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens] > (235 letters) > > > Your code: I changed only the input filename and the input database > name, and saved the script as test.pl > ----------------------- > #!/usr/bin/perl > > use strict; > use warnings; > use Bio::SeqIO; > use Bio::Tools::Run::StandAloneBlast; > > my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' => > 'Fasta'); > my $queryin = $Seq_in->next_seq(); > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > 'database' => > 'testp.fa', > _READMETHOD => 'Blast' > ); > $factory->outfile("out.blast"); > my $blastreport = $factory->blastall($queryin); > ------------------------------------------------------------------------ > ----------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9846257 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From torsten.seemann at infotech.monash.edu.au Wed Apr 4 20:17:10 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 5 Apr 2007 10:17:10 +1000 Subject: [Bioperl-l] How to patch something in installed bioperl module In-Reply-To: References: Message-ID: > I am advised to patch (comment out some lines and add some) GFF.pm bioperl module. > How do i go about it?. First, make a backup of the original file. Then just edit the original (add/remove lines). > I have the latest Bioperl 1.5.2 version installed....via CPAN > I find GFF.pm in the following location... > /root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm This is not where it is installed. That is where the CPAN program uncompressed it to before installing. It is more likely in a directory like this: /usr/lib/perl5/site_perl/5.8.5/Bio/Tools/GFF.pm But it depends on how your Perl setup arranges things! > Do i have to recompile it after editing........ No. --Torsten From torsten.seemann at infotech.monash.edu.au Wed Apr 4 20:22:37 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 5 Apr 2007 10:22:37 +1000 Subject: [Bioperl-l] blastall problem In-Reply-To: <9842643.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> Message-ID: > Software error: > ------------- EXCEPTION ------------- > MSG: not Bio::Seq object or array of Bio::Seq objects or file name! > STACK Bio::Tools::Run::StandAloneBlast::blastpgp > /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611 > STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 > my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => 'Fasta'); Does this still happen if you give the full path to the FASTA file? eg. -file => /usr/local/apache2/htdocs/result/fasta.faa (I'm guessing what the full path is here) --Torsten From gilbertd at cricket.bio.indiana.edu Wed Apr 4 20:59:23 2007 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Wed, 4 Apr 2007 19:59:23 -0500 (EST) Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output Message-ID: <200704050059.l350xNF07452@cricket.bio.indiana.edu> Dear Bioperl list, There is a small bug in what I think is the current Bio::Tools::GFF.pm, that blocks output of Target attributes (in gff3 at least). See a patch here http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF -- Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ From torsten.seemann at infotech.monash.edu.au Wed Apr 4 21:34:17 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 5 Apr 2007 11:34:17 +1000 Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports Message-ID: Dear all, I have been migrating all our BLAST infrastructure to use the XML output mode, the "blastpgp -m 7" option, referred to 'blastxml' format in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report before, and encountered some issues I hope you can help me with: 1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a Bio::Search::Result::GenericResult object. This means I can not use the PSI-BLAST functions like iterations() and psiblast() provided by Bio::Search::Result::BlastResult. I'm guessing this is because the the XML output reports itself as a plain BLASTP output: blastp How do I determine if it is a PSI-BLAST report? 2. Usually a PSI-BLAST report has multiple Iterations. The XML output has tags but it took me a while to figure out that these get mapped to Bio::SearchIO::Result objects accessible via Bio::SearchIO->next_result(). Is this the proper way to process the iterations? 3. I also notice that only the first result (iteration) has the query_name set. Subsequent ones are empty: RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=MyProtein , db=uniprot_sprot RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query= , db=uniprot_sprot Is this a bug or expected? I'm guessing a lot of these problems are simply due to limitations of the NCBI BLAST XML DTD? --Torsten From gdorjee at hotmail.com Wed Apr 4 20:59:08 2007 From: gdorjee at hotmail.com (DeeGee) Date: Wed, 4 Apr 2007 17:59:08 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: References: <9842643.post@talk.nabble.com> Message-ID: <9848412.post@talk.nabble.com> hi Torsten, Yes, it still gives me the same error even if I give the full path to the fasta file. Following is how I did: ####### part of my script ####### my $Seq_in = Bio::SeqIO->new (-file => '/export/home/local/apache2/htdocs/result/fasta.faa', -format => 'Fasta'); my $queryin = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => '/export/home/dorjee/database/nrpart', _READMETHOD => 'Blast' ); $factory->outfile("/export/home/local/apache2/htdocs/result/out.blast"); my $blastreport = $factory->blastall($queryin); .... thanks man. Torsten Seemann wrote: > >> Software error: >> ------------- EXCEPTION ------------- >> MSG: not Bio::Seq object or array of Bio::Seq objects or file name! >> STACK Bio::Tools::Run::StandAloneBlast::blastpgp >> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611 >> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 > >> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => >> 'Fasta'); > > Does this still happen if you give the full path to the FASTA file? > eg. -file => /usr/local/apache2/htdocs/result/fasta.faa > (I'm guessing what the full path is here) > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9848412 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From torsten.seemann at infotech.monash.edu.au Wed Apr 4 22:57:09 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 5 Apr 2007 12:57:09 +1000 Subject: [Bioperl-l] blastall problem In-Reply-To: <9842643.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> Message-ID: DeeGee, Please add the following lines to help deduce the problem: > my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => > 'Fasta'); die "could not open fasta" if not defined $Seq_in; > my $queryin = $Seq_in->next_seq(); die "could not get seq" if not defined $queryin; Does anything happen now? ... Some other comments: > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', > STACK Bio::Tools::Run::StandAloneBlast::blastpgp I'm not sure why it is in the blastpgp() method when you chose $factory->blastall() ? > _READMETHOD => 'Blast' I don't think this is required anymore in modern Bioperl. Are you using 1.5.x or bioperl-live ? > when i paste the protein sequence into the textarea of my html page and > STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 So this is a CGI script? Does the script run as user 'apache' or 'httpd', or as yourself via SuEXEC? Does 'apache' have permissions to READ/WRITE the result/ directory? --Torsten From cjfields at uiuc.edu Thu Apr 5 00:14:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Apr 2007 23:14:46 -0500 Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports In-Reply-To: References: Message-ID: <8EA4D933-9B99-485E-9CEA-AB39297F90B4@uiuc.edu> On Apr 4, 2007, at 8:34 PM, Torsten Seemann wrote: > Dear all, > > I have been migrating all our BLAST infrastructure to use the XML > output mode, the "blastpgp -m 7" option, referred to 'blastxml' format > in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report > before, and encountered some issues I hope you can help me with: > > 1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a > Bio::Search::Result::GenericResult object. This means I can not use > the PSI-BLAST functions like iterations() and psiblast() provided by > Bio::Search::Result::BlastResult. I'm guessing this is because the the > XML output reports itself as a plain BLASTP output: > blastp > > How do I determine if it is a PSI-BLAST report? I don't know if you can very easily, though I haven't tried myself. If I remember correctly there wasn't a substantial difference in the XML output between regular BLAST XML and PSI-BLAST XML. We could add a parameter to the parser to treat the report as PSI-BLAST. > 2. Usually a PSI-BLAST report has multiple Iterations. The XML output > has tags but it took me a while to figure out that these > get mapped to Bio::SearchIO::Result objects accessible via > Bio::SearchIO->next_result(). > > Is this the proper way to process the iterations? The problem is in the way that NCBI now outputs multiple-query BLAST XML reports, which apparently changed sometime in the last year w/o notice. This was also a problem with other Bio* parsers (I remember seeing something about it on the BioPython list). Previously multiquery BLAST requests were output like single XML reports concatenated together, each with their own XML declaration, etc. Now they are treated like iterations (query 1 = iteration 1, query 2 = iteration 2, etc) all in one long BLAST report. There's an example of one in the SearchIO tests which I added to CVS in Jan-Feb, post-1.5.2. The current parser handles both old and new cases. The current behavior of the parser is to parse everything up front, building up the ResultI's and then returning them one-by-one upon next_result(), which is horrible on memory if you have tons of XML to wade through. I will probably change that to carve the data up into report-sized chunks of XML and parse them piecemeal, but I haven't had time to work on it yet. > 3. I also notice that only the first result (iteration) has the > query_name set. Subsequent ones are empty: > RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP, > query=MyProtein , db=uniprot_sprot > RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query= > , db=uniprot_sprot > > Is this a bug or expected? If you are using 1.5.2 then there is a bug related to that which was fixed in CVS a few months back (related to the multiquery issue above). If it isn't let me know. > I'm guessing a lot of these problems are simply due to limitations of > the NCBI BLAST XML DTD? > > --Torsten To tell the truth I'm not sure. One would think they could add some designation to the report for PSI-BLAST! chris From cjfields at uiuc.edu Thu Apr 5 13:40:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Apr 2007 12:40:41 -0500 Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq (Bio::Seq::Meta::Array) Message-ID: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> Roy Chaudhuri has raised an interesting question in a bug report filed regarding 'bless'-ing objects into another (similar) class. The bug report on this is here: http://bugzilla.open-bio.org/show_bug.cgi?id=2262 The following code (from the bug report) illustrates the problem. Note some of this is taken from the Bio::Seq::Meta::Array POD, though the example sequence object is a LocatableSeq (PrimarySeqI) and not a SeqI: use Bio::SeqIO; use Bio::Seq::Meta::Array; # $seq isa Bio::SeqI my $seq=Bio::SeqIO->new(-fh=>\*ARGV, -format=>'genbank')->next_seq; # $seq is still a Bio::SeqI bless $seq, 'Bio::Seq::Meta::Array'; Bio::SeqIO->new(-format=>'genbank')->write_seq($seq); This produces sequence output missing sequence data, a definition, and other odds and ends. $seq is first a Bio::Seq::RichSeq and is blessed into a Bio::Seq::Meta::Array; both times $seq remains Bio::SeqI. However, Bio::Seq::Meta::Array has an odd inheritance tree which also makes it a Bio::PrimarySeqI and a Bio::Seq::MetaI (ick): use base qw(Bio::LocatableSeq Bio::Seq Bio::Seq::MetaI); Bio::LocatableSeq has a seq() method inherited from Bio::PrimarySeq, for instance, so using $seq->seq() invokes Bio::PrimarySeq::seq() instead of Bio::Seq::seq(). No problem in most cases as long as PrimarySeqI is blessed into another PrimarySeqI, but if one blesses a Bio::SeqI into a Bio::Seq::Meta::Array (as in the example) then PrimarySeq::seq() expects a raw sequence and gets none (since the data is stored internally as a PrimarySeq in a different location) and no sequence is output. This happens similarly for other stored object data. I'm not sure why Bio::Seq::Meta::Array is set up this way. Do we want to support using 'bless $obj, Class' with Bio::SeqI/PrimarySeqI, or should Bio::Seq::Meta::Array be changed so that it follows one interface or the other? chris From hlapp at gmx.net Thu Apr 5 14:27:39 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 5 Apr 2007 14:27:39 -0400 Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq (Bio::Seq::Meta::Array) In-Reply-To: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> Message-ID: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote: > Do we want to support using 'bless $obj, Class' This smacks of over-clever programming and looks like a sure way to obfuscate what you're doing. I'm not sure why we need to support this construct. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Apr 5 14:44:38 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Apr 2007 13:44:38 -0500 Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq (Bio::Seq::Meta::Array) In-Reply-To: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net> References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net> Message-ID: I tend to agree on that front as it seems too prone to subtle issues with inheritance (as the bug demonstrates). Related to that, do we want to have Bio::Seq::Meta::Array implement either PrimarySeqI or SeqI? Having it implement both is definitely not working as expected. chris On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote: > > On Apr 5, 2007, at 1:40 PM, Chris Fields wrote: > >> Do we want to support using 'bless $obj, Class' > > This smacks of over-clever programming and looks like a sure way to > obfuscate what you're doing. I'm not sure why we need to support > this construct. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From mkiwala at watson.wustl.edu Thu Apr 5 15:11:22 2007 From: mkiwala at watson.wustl.edu (Michael Kiwala) Date: Thu, 05 Apr 2007 14:11:22 -0500 Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq (Bio::Seq::Meta::Array) In-Reply-To: References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu> <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net> Message-ID: <461549DA.90709@watson.wustl.edu> My vote is for SeqI. I was using the SeqWithQuality class and more recently switched over to Bio::Seq::Quality as we are upgrading from 1.4 to 1.5.2. The sequences I'm working with are destined for GenBank and have features and quality values. I've written a module (that I call GenBank::Tbl2Asn) that accepts a Bio::Seq::Quality with features and runs tbl2asn on it to produce a file that we send to GenBank. I don't know of any other class that suites my needs better than Bio::Seq::Quality inheriting from Bio::SeqI. Chris Fields wrote: > I tend to agree on that front as it seems too prone to subtle issues > with inheritance (as the bug demonstrates). > > Related to that, do we want to have Bio::Seq::Meta::Array implement > either PrimarySeqI or SeqI? Having it implement both is definitely > not working as expected. > > chris > > On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote: > > >> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote: >> >> >>> Do we want to support using 'bless $obj, Class' >>> >> This smacks of over-clever programming and looks like a sure way to >> obfuscate what you're doing. I'm not sure why we need to support >> this construct. >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From gdorjee at hotmail.com Thu Apr 5 17:09:14 2007 From: gdorjee at hotmail.com (DeeGee) Date: Thu, 5 Apr 2007 14:09:14 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: References: <9842643.post@talk.nabble.com> Message-ID: <9864004.post@talk.nabble.com> Thanks again, Torsten. I tried (die "could not get seq" if not defined $queryin;) as you suggested, and now I get the following error message: Software error: could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50. Does this mean that next_seq() method in 'my $queryin = $Seq_in->next_seq();' has some problem? How can I fix it? I would appreciate your help. Cheers! Torsten Seemann wrote: > > DeeGee, > > Please add the following lines to help deduce the problem: > >> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => >> 'Fasta'); > > die "could not open fasta" if not defined $Seq_in; > >> my $queryin = $Seq_in->next_seq(); > > die "could not get seq" if not defined $queryin; > > Does anything happen now? > > ... > > Some other comments: > >> my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >> 'blastp', >> STACK Bio::Tools::Run::StandAloneBlast::blastpgp > > I'm not sure why it is in the blastpgp() method when you chose > $factory->blastall() ? > >> _READMETHOD => 'Blast' > > I don't think this is required anymore in modern Bioperl. Are you > using 1.5.x or bioperl-live ? > >> when i paste the protein sequence into the textarea of my html page and >> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 > > So this is a CGI script? > Does the script run as user 'apache' or 'httpd', or as yourself via > SuEXEC? > Does 'apache' have permissions to READ/WRITE the result/ directory? > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9864004 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Thu Apr 5 19:32:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Apr 2007 18:32:55 -0500 Subject: [Bioperl-l] blastall problem In-Reply-To: <9864004.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> Message-ID: <3ED7F1E9-FE21-4796-99AC-0CD0EA418563@uiuc.edu> On Apr 5, 2007, at 4:09 PM, DeeGee wrote: > > Thanks again, Torsten. I tried (die "could not get seq" if not defined > $queryin;) as you suggested, and now I get the following error > message: > > Software error: > could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50. > > Does this mean that next_seq() method in 'my $queryin = > $Seq_in->next_seq();' has some problem? How can I fix it? I would > appreciate > your help. > Cheers! This indicates there is likely some problem with your sequence file (either it isn't fasta or something else is wrong), but w/o actually seeing it we can't be sure. I can't be sure but I don't think it is a next_seq() issue. Also, if there are problems accessing the file the stream object should throw an error so I don't think it is that either... chris > > Torsten Seemann wrote: >> >> DeeGee, >> >> Please add the following lines to help deduce the problem: >> >>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '- >>> format' => >>> 'Fasta'); >> >> die "could not open fasta" if not defined $Seq_in; >> >>> my $queryin = $Seq_in->next_seq(); >> >> die "could not get seq" if not defined $queryin; >> >> Does anything happen now? >> >> ... >> >> Some other comments: >> >>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >>> 'blastp', >>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp >> >> I'm not sure why it is in the blastpgp() method when you chose >> $factory->blastall() ? >> >>> _READMETHOD => >>> 'Blast' >> >> I don't think this is required anymore in modern Bioperl. Are you >> using 1.5.x or bioperl-live ? >> >>> when i paste the protein sequence into the textarea of my html >>> page and >>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50 >> >> So this is a CGI script? >> Does the script run as user 'apache' or 'httpd', or as yourself via >> SuEXEC? >> Does 'apache' have permissions to READ/WRITE the result/ directory? >> >> --Torsten >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/blastall- > problem-tf3527412.html#a9864004 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Thu Apr 5 20:40:32 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 6 Apr 2007 10:40:32 +1000 Subject: [Bioperl-l] blastall problem In-Reply-To: <9864004.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> Message-ID: Dorjee, > thanks alot for your reply again. as per your suggestion (using 'die "could > not get seq" if not defined $queryin;'), i now get the following error > message: > Software error: > could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50. > i've attached the script. could you plz have a look at it and see where am i > going wrong. > cheers mate! This strongly suggests that your FASTA file is not actually in FASTA format. http://en.wikipedia.org/wiki/Fasta_format Does it work if you pass it to blastall on the command line? eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr > Saier Lab. > 858-534-2457 Are you working at UCSD? --Torsten From gdorjee at hotmail.com Thu Apr 5 23:26:16 2007 From: gdorjee at hotmail.com (DeeGee) Date: Thu, 5 Apr 2007 20:26:16 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> Message-ID: <9867402.post@talk.nabble.com> hi Torsten, blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works perfectly fine on the command line, and the 'fasta.faa' is in fasta format: >gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS AQGAVAPGPDGGGPFPPWPLG it seems like i'm just one bloody step away from success. ^ ^* can't figure out the prob. thanks for your help. Torsten Seemann wrote: > > Dorjee, > >> thanks alot for your reply again. as per your suggestion (using 'die >> "could >> not get seq" if not defined $queryin;'), i now get the following error >> message: >> Software error: >> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50. >> i've attached the script. could you plz have a look at it and see where >> am i >> going wrong. >> cheers mate! > > This strongly suggests that your FASTA file is not actually in FASTA > format. > http://en.wikipedia.org/wiki/Fasta_format > > Does it work if you pass it to blastall on the command line? > eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr > >> Saier Lab. >> 858-534-2457 > > Are you working at UCSD? > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9867402 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From tuco at pasteur.fr Fri Apr 6 09:33:08 2007 From: tuco at pasteur.fr (Emmanuel Quevillon) Date: Fri, 06 Apr 2007 15:33:08 +0200 Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior Message-ID: <46164C14.8040701@pasteur.fr> Hi folks, I have a strange behavior from Bio::SeqIO::embl. When I read an EMBL file as an input and write to another one, the tags in the output file (EMBL format) are not in the same order as the original file. Is it a normal and expecting result ? I anyone want to test it as a perl on line here is the code : perl -MBio::SeqIO -e '$i = Bio::SeqIO->new(-file => "file.embl", -format => "EMBL"); $o = Bio::SeqIO->new(-file => ">new.embl", -format => "EMBL"); while($e = $i->next_seq()){ $o->write_seq($e); }' I checked in the embl.pm code but was enable to find where this behavior came from. If someone has the solution or any clue. Thanks Regards Emmanuel -- ------------------------- Emmanuel Quevillon Softwares and data banks Pasteur Insititue tuco at_ pasteur dot fr ------------------------- From dmessina at wustl.edu Fri Apr 6 11:09:51 2007 From: dmessina at wustl.edu (David Messina) Date: Fri, 6 Apr 2007 10:09:51 -0500 Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior In-Reply-To: <46164C14.8040701@pasteur.fr> References: <46164C14.8040701@pasteur.fr> Message-ID: <7C67D287-DE2A-488A-8636-01EFF468368D@wustl.edu> > Is it a normal and expecting result ? Yes, unfortunately. Due to the complexity of the parsing, it is surprisingly difficult to "round-trip" some sequence file formats. http://www.bioperl.org/wiki/HOWTO:SeqIO#Caveats Dave From jason at bioperl.org Fri Apr 6 11:42:41 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 6 Apr 2007 08:42:41 -0700 Subject: [Bioperl-l] blastall problem In-Reply-To: <9867402.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> Message-ID: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> When/How are are you writing your sequences to this file result.faa? are you using seqIO or bioperl to write the sequence to a file? I'm wondering if this is I/O buffering problem. On Apr 5, 2007, at 8:26 PM, DeeGee wrote: > > hi Torsten, > blastall -p blastp -i result/fasta.faa -d /export/home/database/nr > works > perfectly fine on the command line, and the 'fasta.faa' is in fasta > format: > >> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] > HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV > SPSMTVASSQ > QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA > GTAPGAEGPA > PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF > RRKEHLRRHR > DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH > QRIHGRAAAS > AQGAVAPGPDGGGPFPPWPLG > > it seems like i'm just one bloody step away from success. ^ ^* > can't figure > out the prob. > thanks for your help. > > > Torsten Seemann wrote: >> >> Dorjee, >> >>> thanks alot for your reply again. as per your suggestion (using 'die >>> "could >>> not get seq" if not defined $queryin;'), i now get the following >>> error >>> message: >>> Software error: >>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl >>> line 50. >>> i've attached the script. could you plz have a look at it and see >>> where >>> am i >>> going wrong. >>> cheers mate! >> >> This strongly suggests that your FASTA file is not actually in FASTA >> format. >> http://en.wikipedia.org/wiki/Fasta_format >> >> Does it work if you pass it to blastall on the command line? >> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ >> database/nr >> >>> Saier Lab. >>> 858-534-2457 >> >> Are you working at UCSD? >> >> --Torsten >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/blastall- > problem-tf3527412.html#a9867402 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0001.bin From bernd.web at gmail.com Fri Apr 6 14:00:18 2007 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 6 Apr 2007 20:00:18 +0200 Subject: [Bioperl-l] blastall problem In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> Message-ID: <716af09c0704061100n1555915bw18050639d25cbf89@mail.gmail.com> Hi Dorjee, Do you now use complete file paths everywhere (instead of some relative paths that were in your script). Did you check all read and execute permission (turn r, x on for group and others)? And regarding the fasta file I'd suggest closing the filehandle after you printed the fasta sequence to the file. open(OUTPUT,">result/fasta.faa"); #don't use this relative path and use the "die" as was suggested earlier. .... your other code lines print OUTPUT "$desc\n$seqo\n"; close(OUTPUT); #close the file. Also check if your complete script runs from the command-line as to be sure your problems are not related to the webserver enviroment. BTW I do think you do not want to parse your fasta file like you do: if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;} $fasta_file=~s/[\n\r]//g; if ($fasta_file =~ /([A-Z]{10}.+)/){$seqo=$1;} $seqo will contain the description as well, so your sequence starts with the description. BioPerl provides code for fasta file parsing too ;-) If you really want to stick to your code you can catch the $desc and $seqo in one RegExp, or replace this line: if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;} with if ($fasta_file =~ s/^(\>.+)\s+//){$desc=$1;} I hope you will get your script working now. Regards, Bernd On 4/6/07, Jason Stajich wrote: > When/How are are you writing your sequences to this file result.faa? are > you using seqIO or bioperl to write the sequence to a file? > I'm wondering if this is I/O buffering problem. > > > > On Apr 5, 2007, at 8:26 PM, DeeGee wrote: > > > hi Torsten, > blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works > perfectly fine on the command line, and the 'fasta.faa' is in fasta format: > > > gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] > HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ > QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA > PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR > DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS > AQGAVAPGPDGGGPFPPWPLG > > it seems like i'm just one bloody step away from success. ^ ^* can't figure > out the prob. > thanks for your help. > > > Torsten Seemann wrote: > > Dorjee, > > > thanks alot for your reply again. as per your suggestion (using 'die > "could > not get seq" if not defined $queryin;'), i now get the following error > message: > Software error: > could not get seq at > /usr/local/apache2/htdocs/remote_ncbi.pl line 50. > i've attached the script. could you plz have a look at it and see where > am i > going wrong. > cheers mate! > > This strongly suggests that your FASTA file is not actually in FASTA > format. > http://en.wikipedia.org/wiki/Fasta_format > > Does it work if you pass it to blastall on the command line? > eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr > > > Saier Lab. > 858-534-2457 > > Are you working at UCSD? > > --Torsten > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > -- > View this message in context: > http://www.nabble.com/blastall-problem-tf3527412.html#a9867402 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.htmlhttp://fungalgenomes.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From gdorjee at hotmail.com Fri Apr 6 13:39:38 2007 From: gdorjee at hotmail.com (DeeGee) Date: Fri, 6 Apr 2007 10:39:38 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> Message-ID: <9875685.post@talk.nabble.com> Following is the part of my script, which is in the 'htdocs' directory: ####### part of my script ############# #generate a new CGI object from the input to the CGI script my $query=new CGI; open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa"); print STDOUT $query->header(); print STDOUT $query->start_html(-title=>"Response from blast", -BGCOLOR=>"#FFFFFF"); print STDOUT "\n

Results from the BLAST

\n"; #gets the sequence from the html textarea with ?post? method my $fasta_file=$query->param('sequence'); print OUTPUT $fasta_file; #Local blast of the input sequence against nr database my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format => 'Fasta'); die "could not open fasta" if not defined $Seq_in; my $queryin = $Seq_in->next_seq(); die "could not get seq" if not defined $queryin; my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => 'blastp', 'database' => '/export/home/dorjee/database/nr', _READMETHOD => 'Blast' ); $factory->outfile("result/out.blast"); my $blastreport = $factory->blastall($queryin); ..... Thank you. Jason Stajich-3 wrote: > > When/How are are you writing your sequences to this file result.faa? > are you using seqIO or bioperl to write the sequence to a file? > I'm wondering if this is I/O buffering problem. > > On Apr 5, 2007, at 8:26 PM, DeeGee wrote: > >> >> hi Torsten, >> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr >> works >> perfectly fine on the command line, and the 'fasta.faa' is in fasta >> format: >> >>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] >> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV >> SPSMTVASSQ >> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA >> GTAPGAEGPA >> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF >> RRKEHLRRHR >> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH >> QRIHGRAAAS >> AQGAVAPGPDGGGPFPPWPLG >> >> it seems like i'm just one bloody step away from success. ^ ^* >> can't figure >> out the prob. >> thanks for your help. >> >> >> Torsten Seemann wrote: >>> >>> Dorjee, >>> >>>> thanks alot for your reply again. as per your suggestion (using 'die >>>> "could >>>> not get seq" if not defined $queryin;'), i now get the following >>>> error >>>> message: >>>> Software error: >>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl >>>> line 50. >>>> i've attached the script. could you plz have a look at it and see >>>> where >>>> am i >>>> going wrong. >>>> cheers mate! >>> >>> This strongly suggests that your FASTA file is not actually in FASTA >>> format. >>> http://en.wikipedia.org/wiki/Fasta_format >>> >>> Does it work if you pass it to blastall on the command line? >>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ >>> database/nr >>> >>>> Saier Lab. >>>> 858-534-2457 >>> >>> Are you working at UCSD? >>> >>> --Torsten >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> -- >> View this message in context: http://www.nabble.com/blastall- >> problem-tf3527412.html#a9867402 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9875685 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Fri Apr 6 14:40:42 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 6 Apr 2007 11:40:42 -0700 Subject: [Bioperl-l] blastall problem In-Reply-To: <9875685.post@talk.nabble.com> References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> <9875685.post@talk.nabble.com> Message-ID: Looks like you need to deal with buffering: http://perl.plover.com/FAQs/Buffering.html So you need to add this: close(OUTPUT); Alternatively you can build a sequence object and pass that in to the BLAST factory, then you don't have to mess around with creating temporary files or run into this sort of problem. -jason On Apr 6, 2007, at 10:39 AM, DeeGee wrote: > > Following is the part of my script, which is in the 'htdocs' > directory: > > ####### part of my script ############# > #generate a new CGI object from the input to the CGI script > my $query=new CGI; > > open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa"); > > print STDOUT $query->header(); > print STDOUT $query->start_html(-title=>"Response from blast", > -BGCOLOR=>"#FFFFFF"); > print STDOUT "\n

Results from the BLAST

\n"; > > #gets the sequence from the html textarea with ?post? method > my $fasta_file=$query->param('sequence'); > print OUTPUT $fasta_file; > close(OUTPUT); > #Local blast of the input sequence against nr database > my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format => > 'Fasta'); > die "could not open fasta" if not defined $Seq_in; > my $queryin = $Seq_in->next_seq(); > die "could not get seq" if not defined $queryin; > my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => > 'blastp', > 'database' => > '/export/home/dorjee/database/nr', > _READMETHOD => > 'Blast' > ); > $factory->outfile("result/out.blast"); > my $blastreport = $factory->blastall($queryin); > ..... > > Thank you. > > > > Jason Stajich-3 wrote: >> >> When/How are are you writing your sequences to this file result.faa? >> are you using seqIO or bioperl to write the sequence to a file? >> I'm wondering if this is I/O buffering problem. >> >> On Apr 5, 2007, at 8:26 PM, DeeGee wrote: >> >>> >>> hi Torsten, >>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr >>> works >>> perfectly fine on the command line, and the 'fasta.faa' is in fasta >>> format: >>> >>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] >>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA >>> SV >>> SPSMTVASSQ >>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP >>> LA >>> GTAPGAEGPA >>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK >>> AF >>> RRKEHLRRHR >>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL >>> RH >>> QRIHGRAAAS >>> AQGAVAPGPDGGGPFPPWPLG >>> >>> it seems like i'm just one bloody step away from success. ^ ^* >>> can't figure >>> out the prob. >>> thanks for your help. >>> >>> >>> Torsten Seemann wrote: >>>> >>>> Dorjee, >>>> >>>>> thanks alot for your reply again. as per your suggestion (using >>>>> 'die >>>>> "could >>>>> not get seq" if not defined $queryin;'), i now get the following >>>>> error >>>>> message: >>>>> Software error: >>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl >>>>> line 50. >>>>> i've attached the script. could you plz have a look at it and see >>>>> where >>>>> am i >>>>> going wrong. >>>>> cheers mate! >>>> >>>> This strongly suggests that your FASTA file is not actually in >>>> FASTA >>>> format. >>>> http://en.wikipedia.org/wiki/Fasta_format >>>> >>>> Does it work if you pass it to blastall on the command line? >>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ >>>> database/nr >>>> >>>>> Saier Lab. >>>>> 858-534-2457 >>>> >>>> Are you working at UCSD? >>>> >>>> --Torsten >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> -- >>> View this message in context: http://www.nabble.com/blastall- >>> problem-tf3527412.html#a9867402 >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Miller Research Fellow >> University of California, Berkeley >> lab: 510.642.8441 >> http://pmb.berkeley.edu/~taylor/people/js.html >> http://fungalgenomes.org/ >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > View this message in context: http://www.nabble.com/blastall- > problem-tf3527412.html#a9875685 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Miller Research Fellow University of California, Berkeley lab: 510.642.8441 http://pmb.berkeley.edu/~taylor/people/js.html http://fungalgenomes.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment.bin From MEC at stowers-institute.org Fri Apr 6 16:34:37 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 6 Apr 2007 15:34:37 -0500 Subject: [Bioperl-l] Bio/DB/SeqFeature/Store/DBI/mysql.pm patched Message-ID: Lincoln, I just commited a patch to Bio/DB/SeqFeature/Store/DBI/mysql.pm which avoids potential problem which, unless fixed, can generates warnings that look like this: prepare_cached(SELECT f.id,f.object FROM feature as f, typelist AS tl WHERE ( tl.id=f.typeid AND (tl.tag LIKE ?) ) ) statement handle DBI::st=HASH(0x16f61c0) still Active at /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 1427 DBD::mysql::st fetchrow_array failed: fetch() without execute() at /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 1416. ... as well as other downstream abberent program behaviour. I encounterd what the DBI manpage suggests might happen: "The results will certainly not be what you expect" This can happen, for example, when you open an iterator using Bio::DB::SeqFeature::Store->get_seq_stream, and then while iterating, perform other queries against the store. My understanding of the DBI doc is that this should only occur if the 2nd iterator is for the same sql statement identically parameterized as the 1st, but I have not proven beyond a doubt that this is what Bio::DB::SeqFeature::Store is doing the way I am using it. Nonetheless, the patch fixes my pipeline. Cheers, Malcolm From gdorjee at hotmail.com Fri Apr 6 18:27:54 2007 From: gdorjee at hotmail.com (DeeGee) Date: Fri, 6 Apr 2007 15:27:54 -0700 (PDT) Subject: [Bioperl-l] blastall problem In-Reply-To: References: <9842643.post@talk.nabble.com> <9864004.post@talk.nabble.com> <9867402.post@talk.nabble.com> <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org> <9875685.post@talk.nabble.com> Message-ID: <9879110.post@talk.nabble.com> I added the line: close(OUTPUT); and now following error comes up, where 'out.blast' is supposed to be the blast result file, but it not being created. Software error: ------------- EXCEPTION ------------- MSG: Could not open /export/home/dorjee/result/out.blast: No such file or directory STACK Bio::Root::IO::_initialize_io /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:273 STACK Bio::Root::IO::new /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:213 STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:135 STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:167 STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:53 -------------------------------------- Jason Stajich-3 wrote: > > Looks like you need to deal with buffering: > > http://perl.plover.com/FAQs/Buffering.html > > So you need to add this: > close(OUTPUT); > > Alternatively you can build a sequence object and pass that in to the > BLAST factory, then you don't have to mess around with creating > temporary files or run into this sort of problem. > > -jason > On Apr 6, 2007, at 10:39 AM, DeeGee wrote: > >> >> Following is the part of my script, which is in the 'htdocs' >> directory: >> >> ####### part of my script ############# >> #generate a new CGI object from the input to the CGI script >> my $query=new CGI; >> >> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa"); >> >> print STDOUT $query->header(); >> print STDOUT $query->start_html(-title=>"Response from blast", >> -BGCOLOR=>"#FFFFFF"); >> print STDOUT "\n

Results from the BLAST

\n"; >> >> #gets the sequence from the html textarea with ?post? method >> my $fasta_file=$query->param('sequence'); >> print OUTPUT $fasta_file; >> > close(OUTPUT); >> #Local blast of the input sequence against nr database >> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format => >> 'Fasta'); >> die "could not open fasta" if not defined $Seq_in; >> my $queryin = $Seq_in->next_seq(); >> die "could not get seq" if not defined $queryin; >> my $factory = Bio::Tools::Run::StandAloneBlast->new('program' => >> 'blastp', >> 'database' => >> '/export/home/dorjee/database/nr', >> _READMETHOD => >> 'Blast' >> ); >> $factory->outfile("result/out.blast"); >> my $blastreport = $factory->blastall($queryin); >> ..... >> >> Thank you. >> >> >> >> Jason Stajich-3 wrote: >>> >>> When/How are are you writing your sequences to this file result.faa? >>> are you using seqIO or bioperl to write the sequence to a file? >>> I'm wondering if this is I/O buffering problem. >>> >>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote: >>> >>>> >>>> hi Torsten, >>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr >>>> works >>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta >>>> format: >>>> >>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] >>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA >>>> SV >>>> SPSMTVASSQ >>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP >>>> LA >>>> GTAPGAEGPA >>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK >>>> AF >>>> RRKEHLRRHR >>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL >>>> RH >>>> QRIHGRAAAS >>>> AQGAVAPGPDGGGPFPPWPLG >>>> >>>> it seems like i'm just one bloody step away from success. ^ ^* >>>> can't figure >>>> out the prob. >>>> thanks for your help. >>>> >>>> >>>> Torsten Seemann wrote: >>>>> >>>>> Dorjee, >>>>> >>>>>> thanks alot for your reply again. as per your suggestion (using >>>>>> 'die >>>>>> "could >>>>>> not get seq" if not defined $queryin;'), i now get the following >>>>>> error >>>>>> message: >>>>>> Software error: >>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl >>>>>> line 50. >>>>>> i've attached the script. could you plz have a look at it and see >>>>>> where >>>>>> am i >>>>>> going wrong. >>>>>> cheers mate! >>>>> >>>>> This strongly suggests that your FASTA file is not actually in >>>>> FASTA >>>>> format. >>>>> http://en.wikipedia.org/wiki/Fasta_format >>>>> >>>>> Does it work if you pass it to blastall on the command line? >>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ >>>>> database/nr >>>>> >>>>>> Saier Lab. >>>>>> 858-534-2457 >>>>> >>>>> Are you working at UCSD? >>>>> >>>>> --Torsten >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: http://www.nabble.com/blastall- >>>> problem-tf3527412.html#a9867402 >>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> Miller Research Fellow >>> University of California, Berkeley >>> lab: 510.642.8441 >>> http://pmb.berkeley.edu/~taylor/people/js.html >>> http://fungalgenomes.org/ >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> View this message in context: http://www.nabble.com/blastall- >> problem-tf3527412.html#a9875685 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Miller Research Fellow > University of California, Berkeley > lab: 510.642.8441 > http://pmb.berkeley.edu/~taylor/people/js.html > http://fungalgenomes.org/ > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9879110 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gilbertd at cricket.bio.indiana.edu Fri Apr 6 23:31:29 2007 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Fri, 6 Apr 2007 22:31:29 -0500 (EST) Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths Message-ID: <200704070331.l373VTI22000@cricket.bio.indiana.edu> Dear Bioperlers, There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta files have fixed line widths, but that isn't a requirement of Fasta format. The documentation notes this package requirement, but I was bitten by this, and I'd guess not many people check their data (esp. if from someone else) to see it meets this requirement. Simple tools can easily produce fasta with ragged line formatting: e.g. genome assemblers that paste together contig fasta with spacers to make assemblies. It would be nice if B:D:Fasta would check and die when it can't handle this ragged input. Here is a suggested addition: package Bio::DB::Fasta; =head1 DESCRIPTION Entries may have any line length up to 65,536 characters, and different line lengths are allowed in the same file. However, within a sequence entry, all lines must be the same length except for the last. + An error will be thrown if this is not the case. =cut use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want sub calculate_offsets { my ($offset,$id,$linelength,$type,$firstline,$count,$termination_length,%offsets); + my ($l3_len,$l2_len,$l_len)=(0,0,0); $self->_check_linelength($linelength); + ($l3_len,$l2_len,$l_len)=(0,0,0); } else { + $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to check every line :( + if(DIE_ON_MISSMATCHED_LINES && + $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) { + my $fap= substr($_,0,20).".."; + $self->throw("Each line of the fasta entry must be the same length except the last. + Line above #$. '$fap' is $l2_len != $l3_len chars."); + } $linelength ||= length($_); -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ From hlapp at gmx.net Sat Apr 7 12:42:13 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 7 Apr 2007 12:42:13 -0400 Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu> References: <200704070331.l373VTI22000@cricket.bio.indiana.edu> Message-ID: <05D43C56-8B30-41C9-8C35-2CD77419DE7F@gmx.net> Wouldn't it be easier (and more robust) to just reformat the file to meet the constant line width requirement? The code required to do that should be fewer lines than your addition below, I think. For example, one could do a fast first-pass through the file simply checking that all sequence lines not followed by a description line or eof have the same length, stopping at the first line that fails the test. If unequal lengths, use Bio::SeqIO to read and write back out the fasta file, then continue as for well-formatted files. -hilmar On Apr 6, 2007, at 11:31 PM, Don Gilbert wrote: > > Dear Bioperlers, > > There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta > files have fixed line widths, but that isn't a requirement of Fasta > format. The documentation notes this package requirement, but I was > bitten by this, and I'd guess not many people check their data (esp. > if from someone else) to see it meets this requirement. > > Simple tools can easily produce fasta with ragged line formatting: > e.g. genome assemblers that paste together contig fasta with spacers > to make assemblies. > > It would be nice if B:D:Fasta would check and die when it can't handle > this ragged input. Here is a suggested addition: > > package Bio::DB::Fasta; > > =head1 DESCRIPTION > > Entries may have any line length up to 65,536 characters, and > different line lengths are allowed in the same file. However, > within > a sequence entry, all lines must be the same length except for the > last. > + An error will be thrown if this is not the case. > > =cut > > use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want > > sub calculate_offsets { > > my ($offset,$id,$linelength,$type,$firstline,$count, > $termination_length,%offsets); > + my ($l3_len,$l2_len,$l_len)=(0,0,0); > > $self->_check_linelength($linelength); > + ($l3_len,$l2_len,$l_len)=(0,0,0); > } else { > + $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # > need to check every line :( > + if(DIE_ON_MISSMATCHED_LINES && > + $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) { > + my $fap= substr($_,0,20).".."; > + $self->throw("Each line of the fasta entry must be the > same length except the last. > + Line above #$. '$fap' is $l2_len != $l3_len chars."); > + } > > $linelength ||= length($_); > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 7 17:13:24 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 7 Apr 2007 17:13:24 -0400 Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths In-Reply-To: <200704071711.l37HBB823983@cricket.bio.indiana.edu> References: <200704071711.l37HBB823983@cricket.bio.indiana.edu> Message-ID: <8177CF47-558F-4891-97B5-69F327EF8A4A@gmx.net> What I was suggesting was the indexer automatically does the reformatting, i.e., to have touch/change the input data if necessary (and obviously one would be able to turn this feature off when the correctness of the input formatting is known). Are you suggesting that this automatic reformatting isn't possible? -hilmar On Apr 7, 2007, at 1:11 PM, Don Gilbert wrote: > > > Hilmar, > > I have added reformatting where appropriate (in code that installs the > files for indexing by Bio::DB::Fasta). What I'm suggesting is a patch > to Bio::DB::Fasta to warn and die when the documented fixed width > that Bio::DB::Fasta requires isn't met. I.e., keep other folks from > being bitten by this hard to identify requirement. Then when they > see that this indexer is failing on inappropriate inputs, they also > can reformat > their Fasta to meet this requirement, and not continue to use the > software with > bad results. The operation of Bio::DB::Fasta is reading a sequence > stream > and it doesn't touch/change the input data, so it would be hard to > patch it > to re-format the input data. > > - Don > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Apr 7 21:00:51 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 7 Apr 2007 21:00:51 -0400 Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths In-Reply-To: <200704080006.l3806Yt25235@cricket.bio.indiana.edu> References: <200704080006.l3806Yt25235@cricket.bio.indiana.edu> Message-ID: Since you'd have to reformat it though, how would you do it then (presumably offline)? -hilmar On Apr 7, 2007, at 8:06 PM, Don Gilbert wrote: > > > Hilmar, > > Yes, basically automatic reformatting isn't possible. If you are > indexing a large genome of fasta data, I'd not want a bioperl script > to rewrite that data, or create a new version, automatically. > > - Don -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From gilbertd at cricket.bio.indiana.edu Sat Apr 7 13:11:11 2007 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Sat, 7 Apr 2007 12:11:11 -0500 (EST) Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths Message-ID: <200704071711.l37HBB823983@cricket.bio.indiana.edu> Hilmar, I have added reformatting where appropriate (in code that installs the files for indexing by Bio::DB::Fasta). What I'm suggesting is a patch to Bio::DB::Fasta to warn and die when the documented fixed width that Bio::DB::Fasta requires isn't met. I.e., keep other folks from being bitten by this hard to identify requirement. Then when they see that this indexer is failing on inappropriate inputs, they also can reformat their Fasta to meet this requirement, and not continue to use the software with bad results. The operation of Bio::DB::Fasta is reading a sequence stream and it doesn't touch/change the input data, so it would be hard to patch it to re-format the input data. - Don -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/ From gilbertd at cricket.bio.indiana.edu Sat Apr 7 20:06:34 2007 From: gilbertd at cricket.bio.indiana.edu (Don Gilbert) Date: Sat, 7 Apr 2007 19:06:34 -0500 (EST) Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths Message-ID: <200704080006.l3806Yt25235@cricket.bio.indiana.edu> Hilmar, Yes, basically automatic reformatting isn't possible. If you are indexing a large genome of fasta data, I'd not want a bioperl script to rewrite that data, or create a new version, automatically. - Don From gdorjee at hotmail.com Mon Apr 9 00:18:39 2007 From: gdorjee at hotmail.com (DeeGee) Date: Sun, 8 Apr 2007 21:18:39 -0700 (PDT) Subject: [Bioperl-l] parse blast report for the best evalue Message-ID: <9898358.post@talk.nabble.com> hi all, i'm trying to parse a blast report using Bio::SearchIO as follows, but since this blast report is generated with many against many (database) fasta sequences, there're many individual blast reports (one for each of the sequence from the query file). i was wondering if there is a way to get only the best hit (with best evalue) from each one of them. ##### part of my script ###### my $in = new Bio::SearchIO(-format => 'blast', -file => $blast_report); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { ........... thanks. -- View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9898358 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From staffa at niehs.nih.gov Mon Apr 9 11:43:19 2007 From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS)) Date: Mon, 09 Apr 2007 11:43:19 -0400 Subject: [Bioperl-l] Retrieve mRNA from Genome Message-ID: I have been retrieving sub-sequence from Genbank genomic records by use of Bio::SeqIO and ->get_SeqFeatures, ->start ->end , but now I'm looking for a quick way to extract CDS or mRNA from a multi-segmented annotation, e.g. mRNA join(72458..72791,84573..84613,93279..94419,94481..94656, 94719..94992,95056..95350,95438..95553,95614..96056) Is there such a method? Please point me to appropriate documentation. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From Kevin.M.Brown at asu.edu Mon Apr 9 12:19:19 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 9 Apr 2007 09:19:19 -0700 Subject: [Bioperl-l] Retrieve mRNA from Genome In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B402FCAED7@EX02.asurite.ad.asu.edu> I believe that is what the spliced_seq method is for $feat->spliced_seq # the "joined" sequence, when there are # multiple sub-locations http://www.bioperl.org/wiki/Bptutorial.pl > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Staffa, Nick (NIH/NIEHS) > Sent: Monday, April 09, 2007 8:43 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Retrieve mRNA from Genome > > I have been retrieving sub-sequence from Genbank genomic > records by use of Bio::SeqIO and ->get_SeqFeatures, ->start > ->end , but now I'm looking for a quick way to extract CDS or > mRNA from a multi-segmented annotation, e.g. > mRNA > join(72458..72791,84573..84613,93279..94419,94481..94656, > > 94719..94992,95056..95350,95438..95553,95614..96056) > > Is there such a method? > Please point me to appropriate documentation. From cjfields at uiuc.edu Mon Apr 9 12:50:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 9 Apr 2007 11:50:05 -0500 Subject: [Bioperl-l] parse blast report for the best evalue In-Reply-To: <9898358.post@talk.nabble.com> References: <9898358.post@talk.nabble.com> Message-ID: You should probably use sort_hits() with a coderef that sorts by evalue to ensure that you retrieve the best evalue (significance() for hits) (see POD for Bio::Search::Result::ResultI). You could then do something like: my $hit; unless ($result->no_hits_found) { # pass coderef to sort by evalue $result->sort_hits(\&sort_by_evalue); # retrieve first (best) hit $hit = $result->next_hit; } # do whatever you want with the best Hit If you plan on retaining data from hits over a ton of different reports it may be best (memory-wise) to only retain the data you want for each hit instead of retaining the actual object. For instance, if you only care about the description and evalue set up a simple data structure to house what you want by the query data instead of retaining all the extra stuff in the Hit object you don't need (all the HSP data, etc). chris On Apr 8, 2007, at 11:18 PM, DeeGee wrote: > > hi all, > i'm trying to parse a blast report using Bio::SearchIO as follows, > but since > this blast report is generated with many against many (database) fasta > sequences, there're many individual blast reports (one for each of the > sequence from the query file). i was wondering if there is a way to > get only > the best hit (with best evalue) from each one of them. > > ##### part of my script ###### > my $in = new Bio::SearchIO(-format => 'blast', -file => > $blast_report); > while( my $result = $in->next_result ) { > while( my $hit = $result->next_hit ) { > ........... > > thanks. > > > -- > View this message in context: http://www.nabble.com/parse-blast- > report-for-the-best-evalue-tf3545784.html#a9898358 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From gdorjee at hotmail.com Mon Apr 9 15:40:02 2007 From: gdorjee at hotmail.com (DeeGee) Date: Mon, 9 Apr 2007 12:40:02 -0700 (PDT) Subject: [Bioperl-l] parse blast report for the best evalue In-Reply-To: References: <9898358.post@talk.nabble.com> Message-ID: <9907757.post@talk.nabble.com> thank you, Chris. ^ ^* Chris Fields wrote: > > You should probably use sort_hits() with a coderef that sorts by > evalue to ensure that you retrieve the best evalue (significance() > for hits) (see POD for Bio::Search::Result::ResultI). You could then > do something like: > > my $hit; > > unless ($result->no_hits_found) { > # pass coderef to sort by evalue > $result->sort_hits(\&sort_by_evalue); > # retrieve first (best) hit > $hit = $result->next_hit; > } > > # do whatever you want with the best Hit > > If you plan on retaining data from hits over a ton of different > reports it may be best (memory-wise) to only retain the data you want > for each hit instead of retaining the actual object. For instance, > if you only care about the description and evalue set up a simple > data structure to house what you want by the query data instead of > retaining all the extra stuff in the Hit object you don't need (all > the HSP data, etc). > > chris > > On Apr 8, 2007, at 11:18 PM, DeeGee wrote: > >> >> hi all, >> i'm trying to parse a blast report using Bio::SearchIO as follows, >> but since >> this blast report is generated with many against many (database) fasta >> sequences, there're many individual blast reports (one for each of the >> sequence from the query file). i was wondering if there is a way to >> get only >> the best hit (with best evalue) from each one of them. >> >> ##### part of my script ###### >> my $in = new Bio::SearchIO(-format => 'blast', -file => >> $blast_report); >> while( my $result = $in->next_result ) { >> while( my $hit = $result->next_hit ) { >> ........... >> >> thanks. >> >> >> -- >> View this message in context: http://www.nabble.com/parse-blast- >> report-for-the-best-evalue-tf3545784.html#a9898358 >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9907757 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bosborne11 at verizon.net Tue Apr 10 09:55:37 2007 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 10 Apr 2007 09:55:37 -0400 Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu> Message-ID: OK, applied. On 4/6/07 11:31 PM, "Don Gilbert" wrote: > + $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to > check every line :( > + if(DIE_ON_MISSMATCHED_LINES && > + $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) { > + my $fap= substr($_,0,20).".."; > + $self->throw("Each line of the fasta entry must be the same length > except the last. > + Line above #$. '$fap' is $l2_len != $l3_len chars."); > + } From MEC at stowers-institute.org Tue Apr 10 12:21:45 2007 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 10 Apr 2007 11:21:45 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option Message-ID: Lincoln, In `perldoc Bio::DB::SeqFeature::Store` I read: "Caching requires the Tie::Cacher module to be installed. If the module is not installed, then caching will silently be disabled." I am wondering about the design motivation for silently disabling caching when Tie::Cacher is not installed. Perhaps at least emitting a warning when -cache is requested and Tie::Cacher is not present is a good idea? I am writing a code that depends upon caching (i.e. upon the equality of in-memory objects). Do you advise that I don't depend upon Tie::Cacher working? I understand that it will NOT work as hoped if the cache is too small for my application. Thanks, Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri From cjfields at uiuc.edu Tue Apr 10 12:31:43 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Apr 2007 11:31:43 -0500 Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More In-Reply-To: References: Message-ID: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> At the moment we do not have a comprehensive list up on the wiki. I have been slowly working (alphabetically!) to switch them over, so any help would be appreciated. I have CC'd this to the main mail list for anyone else interested. chris On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote: > Hey guys, > > I noticed there's an open task regarding moving testing code to use > Test::More etc and that Chris and Nathan are already on to it. Is > there any kind of wiki page that you keep track of which modules you > are already working on? I am new to this and want to contribute, > having a fair amount of unit testing from work, but don't want to step > over other people's work and avoid duplication as well. > Any pointers where i could get started would be much appreciated :-) > > Thanks, > Spiros > > ps. apologies if this is not the correct list to post this, just > seemed the most intuitive choice. > _______________________________________________ > Bioperl-guts-l mailing list > Bioperl-guts-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From spiros at lokku.com Tue Apr 10 12:34:49 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Tue, 10 Apr 2007 17:34:49 +0100 Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> References: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> Message-ID: Okay, awesome, thank you for the info. I'll get started and see how it goes! Spiros On 4/10/07, Chris Fields wrote: > At the moment we do not have a comprehensive list up on the wiki. I > have been slowly working (alphabetically!) to switch them over, so > any help would be appreciated. > > I have CC'd this to the main mail list for anyone else interested. > > chris > > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote: > > > Hey guys, > > > > I noticed there's an open task regarding moving testing code to use > > Test::More etc and that Chris and Nathan are already on to it. Is > > there any kind of wiki page that you keep track of which modules you > > are already working on? I am new to this and want to contribute, > > having a fair amount of unit testing from work, but don't want to step > > over other people's work and avoid duplication as well. > > Any pointers where i could get started would be much appreciated :-) > > > > Thanks, > > Spiros > > > > ps. apologies if this is not the correct list to post this, just > > seemed the most intuitive choice. > > _______________________________________________ > > Bioperl-guts-l mailing list > > Bioperl-guts-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From cjfields at uiuc.edu Tue Apr 10 12:34:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Apr 2007 11:34:12 -0500 Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option In-Reply-To: References: Message-ID: <0D396A53-9911-4304-88FE-CCD6884A2699@uiuc.edu> On Apr 10, 2007, at 11:21 AM, Cook, Malcolm wrote: > Lincoln, > > In `perldoc Bio::DB::SeqFeature::Store` I read: > > "Caching requires the Tie::Cacher module to be installed. If the > module > is not installed, then caching will silently be disabled." > > I am wondering about the design motivation for silently disabling > caching when Tie::Cacher is not installed. Perhaps at least > emitting a > warning when -cache is requested and Tie::Cacher is not present is a > good idea? ... Maybe this should be added to the optional BioPerl dependencies? It's not listed in Build.PL in CVS... chris From cjfields at uiuc.edu Tue Apr 10 13:22:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Apr 2007 12:22:33 -0500 Subject: [Bioperl-l] ] moving tests to use Test::More In-Reply-To: References: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> Message-ID: When moving tests over be particularly careful of 'ok' tests which should be 'is'; a few older tests have display messages which make things tricky. Use 'isa_ok', 'use_ok', 'require_ok', 'like', etc. where appropriate. Also, we are not supporting TODO blocks at this time due to the upgrade needed for Test::Harness (which isn't necessary for BioPerl functionality). Just use a skip block with a message if you run into something, like this (from RNA_SearchIO.t): SKIP: { skip('Working on meta string building; TODO', 3); is($hsp->meta, 'blahblahblah', "HSP meta"); # two more tests... } Thanks for helping out! chris On Apr 10, 2007, at 11:34 AM, Spiros Denaxas wrote: > Okay, awesome, thank you for the info. I'll get started and see how > it goes! > > Spiros ... From gopu_36 at yahoo.com Tue Apr 10 03:42:26 2007 From: gopu_36 at yahoo.com (gopu_36) Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT) Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome Message-ID: <9915265.post@talk.nabble.com> Hi, I am one of the newbee venturingout bioperl for my research purposes. I have a whole genome sequence of a pathogen. I am trying to break them into non-overlapping 1000bps subsequences. For example if my whole genome sequence is 400000 bps length, then I should be beak them into 4000 subsequences of each 1000 bps and they should be non-overlapping but at the same time continous. To be precise, my first substring would be from 1 to 1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone help me. I tried with the following code but it gives me only the first substring and rest are not! I would appreciate very much if someone could help me! ......... . . my $start =1; my $finish =100; my $inseq = Bio::SeqIO->new(-file => "$in_file"); while( my $seq = $inseq->next_seq ) { my $cleseq = $seq->seq(); $seqlength = $seq->length(); if ($finish<$seqlength){ print "The length of the sequence is $seqlength\n"; my $ordseq = $cleseq->subseq($start,$finish); push(@seq_array,$ordseq); $start=+100; $finish=+100; $counter++; next; } } -- View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gopu_36 at yahoo.com Tue Apr 10 03:42:26 2007 From: gopu_36 at yahoo.com (gopu_36) Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT) Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome Message-ID: <9915265.post@talk.nabble.com> Hi, I am one of the newbee venturingout bioperl for my research purposes. I have a whole genome sequence of a pathogen. I am trying to break them into non-overlapping 1000bps subsequences. For example if my whole genome sequence is 400000 bps length, then I should be beak them into 4000 subsequences of each 1000 bps and they should be non-overlapping but at the same time continous. To be precise, my first substring would be from 1 to 1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone help me. I tried with the following code but it gives me only the first substring and rest are not! I would appreciate very much if someone could help me! ......... . . my $start =1; my $finish =100; my $inseq = Bio::SeqIO->new(-file => "$in_file"); while( my $seq = $inseq->next_seq ) { my $cleseq = $seq->seq(); $seqlength = $seq->length(); if ($finish<$seqlength){ print "The length of the sequence is $seqlength\n"; my $ordseq = $cleseq->subseq($start,$finish); push(@seq_array,$ordseq); $start=+100; $finish=+100; $counter++; next; } } -- View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From bix at sendu.me.uk Tue Apr 10 16:10:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 10 Apr 2007 21:10:35 +0100 Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome In-Reply-To: <9915265.post@talk.nabble.com> References: <9915265.post@talk.nabble.com> Message-ID: <461BEF3B.3080708@sendu.me.uk> gopu_36 wrote: > Hi, > I am one of the newbee venturingout bioperl for my research purposes. I have > a whole genome sequence of a pathogen. I am trying to break them into > non-overlapping 1000bps subsequences. [snip] > I tried with the following code but it gives me only the first substring and > rest are not! I would appreciate very much if someone could help me! [snip] > my $start =1; > my $finish =100; > my $inseq = Bio::SeqIO->new(-file => "$in_file"); > while( my $seq = $inseq->next_seq ) { > > my $cleseq = $seq->seq(); > > $seqlength = $seq->length(); > if ($finish<$seqlength){ > print "The length of the sequence is $seqlength\n"; > my $ordseq = $cleseq->subseq($start,$finish); > push(@seq_array,$ordseq); > $start=+100; > $finish=+100; > $counter++; > next; > } > } Unless I've misunderstood, there are a few problems here. I'm guessing $in_file is a file containing the entire genome sequence as a single sequence. This means your while loop will only loop once. To do what you want you then need another loop that acts on the single $seq object you're going to get. You don't need $cleseq, and in fact your script ought to crash on the $cleseq->subseq line because $cleseq is a string which has no subseq() method. $seq->subseq is what you want. I didn't look at the remaining code. Hope that helps, Sendu. From cjfields at uiuc.edu Tue Apr 10 16:22:15 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Apr 2007 15:22:15 -0500 Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome In-Reply-To: <9915265.post@talk.nabble.com> References: <9915265.post@talk.nabble.com> Message-ID: <88E9CC63-48FD-444B-877D-12BB1D944214@uiuc.edu> There is a script in the BioPerl scripts directory which does this, with optional overlaps (split_seq.PLS). It's in /scripts/seq. chris On Apr 10, 2007, at 2:42 AM, gopu_36 wrote: > > Hi, > I am one of the newbee venturingout bioperl for my research > purposes. I have > a whole genome sequence of a pathogen. I am trying to break them into > non-overlapping 1000bps subsequences. For example if my whole genome > sequence is 400000 bps length, then I should be beak them into 4000 > subsequences of each 1000 bps and they should be non-overlapping > but at the > same time continous. To be precise, my first substring would be > from 1 to > 1000 bps, second substing would be from 1001 to 2000 etcc.. Could > anyone > help me. > I tried with the following code but it gives me only the first > substring and > rest are not! I would appreciate very much if someone could help me! > ......... > . > . > my $start =1; > my $finish =100; > my $inseq = Bio::SeqIO->new(-file => "$in_file"); > while( my $seq = $inseq->next_seq ) { > > my $cleseq = $seq->seq(); > > $seqlength = $seq->length(); > if ($finish<$seqlength){ > print "The length of the sequence is $seqlength\n"; > my $ordseq = $cleseq->subseq($start,$finish); > push(@seq_array,$ordseq); > $start=+100; > $finish=+100; > $counter++; > next; > } > } > -- > View this message in context: http://www.nabble.com/extract- > nonoverlapping-subsequences-from-a-whole-genome- > tf3551560.html#a9915265 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Apr 10 16:57:20 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 10 Apr 2007 15:57:20 -0500 Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome In-Reply-To: <9915265.post@talk.nabble.com> References: <9915265.post@talk.nabble.com> Message-ID: <18529D36-C772-474A-9CE6-A29FA0C59ABA@uiuc.edu> Okay, I was bored! This is a little shorter than that script: my $seqin = Bio::SeqIO->new(-format => 'fasta', -file => shift); my $seqout = Bio::SeqIO->new(-format => 'fasta', -file => '>split.fas'); while( my $seq = $seqin->next_seq ) { my $seqlength = $seq->length(); print STDERR "Length is $seqlength\n"; my $start = 1; my $end = 100; my $desc = $seq->description; CHUNK: while ($end <= $seqlength){ my $ordseq = $seq->trunc($start,$end); $ordseq->description("$start-$end $desc"); $seqout->write_seq($ordseq); last CHUNK if $end >= $seqlength; $start += 100; $end = ($end + 100 > $seqlength) ? $seqlength : $end + 100; } } chris On Apr 10, 2007, at 2:42 AM, gopu_36 wrote: > > Hi, > I am one of the newbee venturingout bioperl for my research > purposes. I have > a whole genome sequence of a pathogen. I am trying to break them into > non-overlapping 1000bps subsequences. For example if my whole genome > sequence is 400000 bps length, then I should be beak them into 4000 > subsequences of each 1000 bps and they should be non-overlapping > but at the > same time continous. To be precise, my first substring would be > from 1 to > 1000 bps, second substing would be from 1001 to 2000 etcc.. Could > anyone > help me. > I tried with the following code but it gives me only the first > substring and > rest are not! I would appreciate very much if someone could help me! > ......... > . > . > my $start =1; > my $finish =100; > my $inseq = Bio::SeqIO->new(-file => "$in_file"); > while( my $seq = $inseq->next_seq ) { > > my $cleseq = $seq->seq(); > > $seqlength = $seq->length(); > if ($finish<$seqlength){ > print "The length of the sequence is $seqlength\n"; > my $ordseq = $cleseq->subseq($start,$finish); > push(@seq_array,$ordseq); > $start=+100; > $finish=+100; > $counter++; > next; > } > } > -- > View this message in context: http://www.nabble.com/extract- > nonoverlapping-subsequences-from-a-whole-genome- > tf3551560.html#a9915265 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Tue Apr 10 18:01:37 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 10 Apr 2007 18:01:37 -0400 Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths In-Reply-To: References: <200704070331.l373VTI22000@cricket.bio.indiana.edu> Message-ID: <6dce9a0b0704101501y15b96e20w89c4b9ef4abc1b48@mail.gmail.com> I'm happy I didn't catch this thread until just now, but my preferred course of action was to do exactly what Brian did and accept the patch. Lincoln On 4/10/07, Brian Osborne wrote: > > OK, applied. > > > On 4/6/07 11:31 PM, "Don Gilbert" > wrote: > > > + $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need > to > > check every line :( > > + if(DIE_ON_MISSMATCHED_LINES && > > + $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) { > > + my $fap= substr($_,0,20).".."; > > + $self->throw("Each line of the fasta entry must be the same > length > > except the last. > > + Line above #$. '$fap' is $l2_len != $l3_len chars."); > > + } > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From heikki at sanbi.ac.za Wed Apr 11 05:14:27 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 11 Apr 2007 11:14:27 +0200 Subject: [Bioperl-l] Fwd: SimpleAlign bug? Message-ID: <200704111114.27839.heikki@sanbi.ac.za> What is going on here? Can anyone remember doing this? -Heikki Please can I ask what is the purpose of the line @pos = sort @pos; in the select_noncont subroutine of SimpleAlign.pm. In previous versions this line was not present and I could use the function to reorder the alignment e.g in an alignment with 5 sequences I could reorder it to put the second sequence last using $aln->select_noncont(1,3,4,5,2). The sort function stops this, but even if the idea is to sort numerically this dos not work since the sort function as is will put 10 before 2, so that ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . Many thanks Anthony From cjfields at uiuc.edu Wed Apr 11 08:33:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Apr 2007 07:33:42 -0500 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za> References: <200704111114.27839.heikki@sanbi.ac.za> Message-ID: Don't know when this was added. Maybe we should make the sorting optional? In other words, pass an optional 'nosort' string as the first arg, defaulting to numerical sort. Either way the sort needs to be changed by the looks of it. I'll verify the bug and commit today. chris On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote: > What is going on here? Can anyone remember doing this? > > -Heikki > > Please can I ask what is the purpose of the line @pos = sort @pos; in > the select_noncont subroutine of SimpleAlign.pm. > > > > In previous versions this line was not present and I could use the > function to reorder the alignment e.g in an alignment with 5 > sequences I > could reorder it to put the second sequence last using > $aln->select_noncont(1,3,4,5,2). The sort function stops this, but > even > if the idea is to sort numerically this dos not work since the sort > function as is will put 10 before 2, so that > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . > > > > Many thanks > > > > Anthony > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lzlgboy at gmail.com Wed Apr 11 08:48:30 2007 From: lzlgboy at gmail.com (kenzy ken) Date: Wed, 11 Apr 2007 20:48:30 +0800 Subject: [Bioperl-l] How to Remove root node from a tree, ??? Message-ID: Hi all: I write a script which used the Bio::Tree module. I want to remove some nodes from the tree, so I used "$tree->remove_Node($node_object);method . It works ok, but when I remove root node, problem happened. It seens that this method can not remove root node, so ,if you guys have any idea about how to remove the root ,it will be very appreciated. -- ?????? Chen,Kenian =========================== School of Life Science, Sun Yat-Sen University =========================== Xingang Xilu 135 Guangzhou, Guangdong 510275 P. R. China =========================== Phone: (86) 20-84113677; (86) 20-34474683; Fax: (86) 20-34022356 =========================== Email:lzlgboy at gmail.com; chenkn at mail2.sysu.edu.cn From cjfields at uiuc.edu Wed Apr 11 09:13:40 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Apr 2007 08:13:40 -0500 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: References: <200704111114.27839.heikki@sanbi.ac.za> Message-ID: <9DE1A554-4F33-45D1-9043-732FEB86ECD5@uiuc.edu> I confirmed this; it is now fixed in CVS. I have also added the option to prevent sorting if needed: $aln2 = $aln->select_noncont(6,7,8,9,10,1,2,3,4,5); sorts numerically by default. $aln2 = $aln->select_noncont('nosort',6,7,8,9,10,1,2,3,4,5); prevents sorting. I have added a few tests to SimpleAlign.t for these. It doesn't change the default behavior so shouldn't break anything. chris On Apr 11, 2007, at 7:33 AM, Chris Fields wrote: > Don't know when this was added. Maybe we should make the sorting > optional? In other words, pass an optional 'nosort' string as the > first arg, defaulting to numerical sort. > > Either way the sort needs to be changed by the looks of it. I'll > verify the bug and commit today. > > chris > > On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote: > >> What is going on here? Can anyone remember doing this? >> >> -Heikki >> >> Please can I ask what is the purpose of the line @pos = sort @pos; in >> the select_noncont subroutine of SimpleAlign.pm. >> >> >> >> In previous versions this line was not present and I could use the >> function to reorder the alignment e.g in an alignment with 5 >> sequences I >> could reorder it to put the second sequence last using >> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but >> even >> if the idea is to sort numerically this dos not work since the sort >> function as is will put 10 before 2, so that >> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in >> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . >> >> >> >> Many thanks >> >> >> >> Anthony >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Wed Apr 11 09:21:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 11 Apr 2007 14:21:25 +0100 Subject: [Bioperl-l] How to Remove root node from a tree, ??? In-Reply-To: References: Message-ID: <461CE0D5.9040001@sendu.me.uk> kenzy ken wrote: > Hi all: > I write a script which used the Bio::Tree module. I want to remove some > nodes from the tree, so I used "$tree->remove_Node($node_object);method > . It > works ok, but when I remove root node, problem happened. It seens that this > method can not remove root node, so ,if you guys have any idea about how to > remove the root ,it will be very appreciated. You'll have to re-root the tree to some other node in the tree. See the reroot() method. (I don't think Bio::Tree::Tree objects can be unrooted.) From emeric.sevin at univ-rennes1.fr Wed Apr 11 09:32:38 2007 From: emeric.sevin at univ-rennes1.fr (Emeric Sevin) Date: Wed, 11 Apr 2007 15:32:38 +0200 Subject: [Bioperl-l] rpsblast results unsupported by Bio::SearchIO::Writer In-Reply-To: <8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr> References: <46028EA0.7070901@crs4.it> <8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr> Message-ID: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr> Hi everybody, I'm sorry to bug, but either I missed something so obvious nobody bothered to answer, either I'm being a little boycotted here... A little help would be very much appreciated Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit : > Hello, > > I am new to this community, and apologize if this subject has been > posted before. > > I want to print out only selected results from a multiple > blast-alignments results file. Problem is, the algorithm used is > rpsblast. The parsing (with Bio::SearchIO) goes fine, but the actual > writing task yields "unclean" warnings. Although an ouput is actually > written, the writer (Bio::SearchIO::Writer::TextResultWriter) seems to > be disturbed by the fact rpsblast DBs are not labeled with > "protein"/"nucleic"/"translated". > Does anybody know of an easy fix to that bug, or of another way to > come around it? > > Thank you very much > > Emeric SEVIN > Universit? de Rennes 1_______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1110 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/9784f194/attachment.bin From cjfields at uiuc.edu Wed Apr 11 10:44:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Apr 2007 09:44:27 -0500 Subject: [Bioperl-l] rpsblast results unsupported by Bio::SearchIO::Writer In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr> References: <46028EA0.7070901@crs4.it> <8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr> <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr> Message-ID: We could ignore this post... oh the irony! ;> It has nothing to do with ignoring you. Read this: http://en.wikipedia.org/wiki/Warnock's_Dilemma Basically your question probably fell on deaf ears b/c no one has time to look into it and post a fix. Realize that BioPerl is, for the large part, a volunteer effort and we all have $jobs to worry about. If you want you are more than welcome to file a bug on this (if it isn't already filed), which is the best way to make sure something is done: http://www.bioperl.org/wiki/Bugs http://bugzilla.open-bio.org/ chris On Apr 11, 2007, at 8:32 AM, Emeric Sevin wrote: > Hi everybody, > > I'm sorry to bug, but either I missed something so obvious nobody > bothered to answer, either I'm being a little boycotted here... > A little help would be very much appreciated > > Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit : > >> Hello, >> >> I am new to this community, and apologize if this subject has been >> posted before. >> >> I want to print out only selected results from a multiple blast- >> alignments results file. Problem is, the algorithm used is >> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the >> actual writing task yields "unclean" warnings. Although an ouput >> is actually written, the writer >> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by >> the fact rpsblast DBs are not labeled with >> "protein"/"nucleic"/"translated". >> Does anybody know of an easy fix to that bug, or of another way to >> come around it? >> >> Thank you very much >> >> Emeric SEVIN >> Universit? de Rennes 1_______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Wed Apr 11 10:30:11 2007 From: n.haigh at sheffield.ac.uk (Nathan Haigh) Date: Wed, 11 Apr 2007 15:30:11 +0100 Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> References: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> Message-ID: <461CF0F3.1010708@sheffield.ac.uk> It should be easy enough to find those t/*.t files that have "use Test;" or "require Test;" This should provide a list of files still needing to be converted over to Test::More. As discussed previously, it may also be useful to use Test::Exception to test for situations where exceptions/warnings are thrown. If you add additional tests using this module, you should add the Test::Exception module to t/lib/ Good luck, and feel free to mail the list with questions/comments etc. Nath Chris Fields wrote: > At the moment we do not have a comprehensive list up on the wiki. I > have been slowly working (alphabetically!) to switch them over, so > any help would be appreciated. > > I have CC'd this to the main mail list for anyone else interested. > > chris > > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote: > > >> Hey guys, >> >> I noticed there's an open task regarding moving testing code to use >> Test::More etc and that Chris and Nathan are already on to it. Is >> there any kind of wiki page that you keep track of which modules you >> are already working on? I am new to this and want to contribute, >> having a fair amount of unit testing from work, but don't want to step >> over other people's work and avoid duplication as well. >> Any pointers where i could get started would be much appreciated :-) >> >> Thanks, >> Spiros >> >> ps. apologies if this is not the correct list to post this, just >> seemed the most intuitive choice. >> _______________________________________________ >> Bioperl-guts-l mailing list >> Bioperl-guts-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From spiros at lokku.com Wed Apr 11 10:56:22 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Wed, 11 Apr 2007 15:56:22 +0100 Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More In-Reply-To: <461CF0F3.1010708@sheffield.ac.uk> References: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> <461CF0F3.1010708@sheffield.ac.uk> Message-ID: Yep! I have some rough stats I have at home, I will post them later on tonight. Roughly, if i remember correctly, 50% of the tests were still using Test, all the others were using Test::More. More to follow later on, Spiros On 4/11/07, Nathan Haigh wrote: > It should be easy enough to find those t/*.t files that have "use Test;" > or "require Test;" This should provide a list of files still needing to > be converted over to Test::More. As discussed previously, it may also be > useful to use Test::Exception to test for situations where > exceptions/warnings are thrown. If you add additional tests using this > module, you should add the Test::Exception module to t/lib/ > > Good luck, and feel free to mail the list with questions/comments etc. > > Nath > > > Chris Fields wrote: > > At the moment we do not have a comprehensive list up on the wiki. I > > have been slowly working (alphabetically!) to switch them over, so > > any help would be appreciated. > > > > I have CC'd this to the main mail list for anyone else interested. > > > > chris > > > > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote: > > > > > >> Hey guys, > >> > >> I noticed there's an open task regarding moving testing code to use > >> Test::More etc and that Chris and Nathan are already on to it. Is > >> there any kind of wiki page that you keep track of which modules you > >> are already working on? I am new to this and want to contribute, > >> having a fair amount of unit testing from work, but don't want to step > >> over other people's work and avoid duplication as well. > >> Any pointers where i could get started would be much appreciated :-) > >> > >> Thanks, > >> Spiros > >> > >> ps. apologies if this is not the correct list to post this, just > >> seemed the most intuitive choice. > >> _______________________________________________ > >> Bioperl-guts-l mailing list > >> Bioperl-guts-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > >> > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From Kevin.M.Brown at asu.edu Wed Apr 11 11:14:07 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 11 Apr 2007 08:14:07 -0700 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za> References: <200704111114.27839.heikki@sanbi.ac.za> Message-ID: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> > What is going on here? Can anyone remember doing this? > > -Heikki > > Please can I ask what is the purpose of the line @pos = sort > @pos; in the select_noncont subroutine of SimpleAlign.pm. > > > > In previous versions this line was not present and I could > use the function to reorder the alignment e.g in an alignment > with 5 sequences I could reorder it to put the second > sequence last using $aln->select_noncont(1,3,4,5,2). The sort > function stops this, but even if the idea is to sort > numerically this dos not work since the sort function as is > will put 10 before 2, so that > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . Not sure why 10 would come before 2 since perl would interpret that list as a series of integers even if they were entered as strings and do the sort. From spiros at lokku.com Wed Apr 11 11:51:27 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Wed, 11 Apr 2007 16:51:27 +0100 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> References: <200704111114.27839.heikki@sanbi.ac.za> <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> Message-ID: This looks like the case of cmp vs <=> I think ! my @array = (1,10,2,3,4,5,6,7,8,9) ; print join(",", @array), "\n"; my @sorted1 = sort(@array) ; print join(",", @sorted1), "\n"; my @sorted2 = (sort { $a <=> $b } @array); print join(",", @sorted2), "\n"; idaru:/tmp spiros$ perl koko.pl 1,10,2,3,4,5,6,7,8,9 # normal array 1,10,2,3,4,5,6,7,8,9 # sorted with sort 1,2,3,4,5,6,7,8,9,10 # sorted with <=> Spiros On 4/11/07, Kevin Brown wrote: > > What is going on here? Can anyone remember doing this? > > > > -Heikki > > > > Please can I ask what is the purpose of the line @pos = sort > > @pos; in the select_noncont subroutine of SimpleAlign.pm. > > > > > > > > In previous versions this line was not present and I could > > use the function to reorder the alignment e.g in an alignment > > with 5 sequences I could reorder it to put the second > > sequence last using $aln->select_noncont(1,3,4,5,2). The sort > > function stops this, but even if the idea is to sort > > numerically this dos not work since the sort function as is > > will put 10 before 2, so that > > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in > > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . > > Not sure why 10 would come before 2 since perl would interpret that list > as a series of integers even if they were entered as strings and do the > sort. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ak at ebi.ac.uk Wed Apr 11 11:58:52 2007 From: ak at ebi.ac.uk (Andreas Kahari) Date: Wed, 11 Apr 2007 16:58:52 +0100 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> References: <200704111114.27839.heikki@sanbi.ac.za> <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> Message-ID: <20070411155852.GC24537@ebi.ac.uk> On Wed, Apr 11, 2007 at 08:14:07AM -0700, Kevin Brown wrote: > > What is going on here? Can anyone remember doing this? > > > > -Heikki > > > > Please can I ask what is the purpose of the line @pos = sort > > @pos; in the select_noncont subroutine of SimpleAlign.pm. > > > > > > > > In previous versions this line was not present and I could > > use the function to reorder the alignment e.g in an alignment > > with 5 sequences I could reorder it to put the second > > sequence last using $aln->select_noncont(1,3,4,5,2). The sort > > function stops this, but even if the idea is to sort > > numerically this dos not work since the sort function as is > > will put 10 before 2, so that > > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in > > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . > > Not sure why 10 would come before 2 since perl would interpret that list > as a series of integers even if they were entered as strings and do the > sort. Really? $ perl -e 'print join(" ", sort(1..20)), "\n"'; 1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9 -- Andreas K?h?ri :: Ensembl Software Developer European Bioinformatics Institute (EMBL-EBI) -------------------*=<>=*------------------- From mkiwala at watson.wustl.edu Wed Apr 11 11:51:35 2007 From: mkiwala at watson.wustl.edu (Michael Kiwala) Date: Wed, 11 Apr 2007 10:51:35 -0500 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> References: <200704111114.27839.heikki@sanbi.ac.za> <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> Message-ID: <461D0407.8050105@watson.wustl.edu> Kevin Brown wrote: >> What is going on here? Can anyone remember doing this? >> >> -Heikki >> >> Please can I ask what is the purpose of the line @pos = sort >> @pos; in the select_noncont subroutine of SimpleAlign.pm. >> >> >> >> In previous versions this line was not present and I could >> use the function to reorder the alignment e.g in an alignment >> with 5 sequences I could reorder it to put the second >> sequence last using $aln->select_noncont(1,3,4,5,2). The sort >> function stops this, but even if the idea is to sort >> numerically this dos not work since the sort function as is >> will put 10 before 2, so that >> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in >> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . >> > > Not sure why 10 would come before 2 since perl would interpret that list > as a series of integers even if they were entered as strings and do the > sort. > > Because, according to the documentation for Perl's sort function, sorting occurs "in standard string comparison order" unless the user specifies another comparison function to use. From cjfields at uiuc.edu Wed Apr 11 12:45:11 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Apr 2007 11:45:11 -0500 Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More In-Reply-To: References: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> <461CF0F3.1010708@sheffield.ac.uk> Message-ID: We should probably place something on the wiki to prevent overlaps (i.e. make sure no two devs are working on the same tests). I planned on working on the G's last night but got bogged down. Spiros, if you haven't already go ahead and create a list on a wiki page for tracking. We can lay claim to them by tagging with our sigs and cross them off once complete. chris On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote: > Yep! I have some rough stats I have at home, I will post them later on > tonight. Roughly, if i remember correctly, 50% of the tests were still > using Test, all the others were using Test::More. > > More to follow later on, > Spiros > > On 4/11/07, Nathan Haigh wrote: >> It should be easy enough to find those t/*.t files that have "use >> Test;" >> or "require Test;" This should provide a list of files still >> needing to >> be converted over to Test::More. As discussed previously, it may >> also be >> useful to use Test::Exception to test for situations where >> exceptions/warnings are thrown. If you add additional tests using >> this >> module, you should add the Test::Exception module to t/lib/ >> >> Good luck, and feel free to mail the list with questions/comments >> etc. >> >> Nath >> >> >> Chris Fields wrote: >> > At the moment we do not have a comprehensive list up on the >> wiki. I >> > have been slowly working (alphabetically!) to switch them over, so >> > any help would be appreciated. >> > >> > I have CC'd this to the main mail list for anyone else interested. >> > >> > chris >> > >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote: >> > >> > >> >> Hey guys, >> >> >> >> I noticed there's an open task regarding moving testing code to >> use >> >> Test::More etc and that Chris and Nathan are already on to it. Is >> >> there any kind of wiki page that you keep track of which >> modules you >> >> are already working on? I am new to this and want to contribute, >> >> having a fair amount of unit testing from work, but don't want >> to step >> >> over other people's work and avoid duplication as well. >> >> Any pointers where i could get started would be much >> appreciated :-) >> >> >> >> Thanks, >> >> Spiros >> >> >> >> ps. apologies if this is not the correct list to post this, just >> >> seemed the most intuitive choice. >> >> _______________________________________________ >> >> Bioperl-guts-l mailing list >> >> Bioperl-guts-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l >> >> >> > >> > Christopher Fields >> > Postdoctoral Researcher >> > Lab of Dr. Robert Switzer >> > Dept of Biochemistry >> > University of Illinois Urbana-Champaign >> > >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Wed Apr 11 12:09:54 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 11 Apr 2007 17:09:54 +0100 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> References: <200704111114.27839.heikki@sanbi.ac.za> <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> Message-ID: <461D0852.9070802@sendu.me.uk> Kevin Brown wrote: >> but even if the idea is to sort >> numerically this dos not work since the sort function as is >> will put 10 before 2, so that >> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in >> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . > > Not sure why 10 would come before 2 since perl would interpret that list > as a series of integers even if they were entered as strings and do the > sort. The default sort for sort() is { $a cmp $b } (standard string comparison order): 10 comes before 2. The fix was to explicitly say sort { $a <=> $b } for a numeric sort. From cjfields at uiuc.edu Wed Apr 11 12:46:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Apr 2007 11:46:46 -0500 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: <461D0407.8050105@watson.wustl.edu> References: <200704111114.27839.heikki@sanbi.ac.za> <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> <461D0407.8050105@watson.wustl.edu> Message-ID: <7001A1A4-5CF4-4C70-8EFA-94AF0D16864C@uiuc.edu> I have confirmed the bug and fixed this in CVS. Kevin's right; sort defaults to string comparison if no subroutine or sort block is specified. perldoc -f sort: sort SUBNAME LIST sort BLOCK LIST sort LIST ... If SUBNAME or BLOCK is omitted, "sort"s in standard string com- parison order. ... chris On Apr 11, 2007, at 10:51 AM, Michael Kiwala wrote: > Kevin Brown wrote: >>> What is going on here? Can anyone remember doing this? >>> >>> -Heikki >>> >>> Please can I ask what is the purpose of the line @pos = sort >>> @pos; in the select_noncont subroutine of SimpleAlign.pm. >>> >>> >>> >>> In previous versions this line was not present and I could >>> use the function to reorder the alignment e.g in an alignment >>> with 5 sequences I could reorder it to put the second >>> sequence last using $aln->select_noncont(1,3,4,5,2). The sort >>> function stops this, but even if the idea is to sort >>> numerically this dos not work since the sort function as is >>> will put 10 before 2, so that >>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the >>> sequences in >>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . >>> >> >> Not sure why 10 would come before 2 since perl would interpret >> that list >> as a series of integers even if they were entered as strings and >> do the >> sort. >> >> > Because, according to the documentation for Perl's sort function, > sorting occurs "in standard string comparison order" unless the user > specifies another comparison function to use. From heikki at sanbi.ac.za Wed Apr 11 12:39:57 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 11 Apr 2007 18:39:57 +0200 Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More In-Reply-To: References: <461CF0F3.1010708@sheffield.ac.uk> Message-ID: <200704111839.58940.heikki@sanbi.ac.za> A bit more than half is still using Test: ~/src/bioperl/core/t> perl -lne 'print $1 if /use +(Test[^\sO;]*);/' *t | sort | uniq -c | sort -nr 147 Test 97 Test::More Feel free to add scripts and functionality into core/maintenance directory of bioperl-live if you want to keep track of things in modules and tests. -Heikki On Wednesday 11 April 2007 16:56:22 Spiros Denaxas wrote: > Yep! I have some rough stats I have at home, I will post them later on > tonight. Roughly, if i remember correctly, 50% of the tests were still > using Test, all the others were using Test::More. > > More to follow later on, > Spiros > > On 4/11/07, Nathan Haigh wrote: > > It should be easy enough to find those t/*.t files that have "use Test;" > > or "require Test;" This should provide a list of files still needing to > > be converted over to Test::More. As discussed previously, it may also be > > useful to use Test::Exception to test for situations where > > exceptions/warnings are thrown. If you add additional tests using this > > module, you should add the Test::Exception module to t/lib/ > > > > Good luck, and feel free to mail the list with questions/comments etc. > > > > Nath > > > > Chris Fields wrote: > > > At the moment we do not have a comprehensive list up on the wiki. I > > > have been slowly working (alphabetically!) to switch them over, so > > > any help would be appreciated. > > > > > > I have CC'd this to the main mail list for anyone else interested. > > > > > > chris > > > > > > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote: > > >> Hey guys, > > >> > > >> I noticed there's an open task regarding moving testing code to use > > >> Test::More etc and that Chris and Nathan are already on to it. Is > > >> there any kind of wiki page that you keep track of which modules you > > >> are already working on? I am new to this and want to contribute, > > >> having a fair amount of unit testing from work, but don't want to step > > >> over other people's work and avoid duplication as well. > > >> Any pointers where i could get started would be much appreciated :-) > > >> > > >> Thanks, > > >> Spiros > > >> > > >> ps. apologies if this is not the correct list to post this, just > > >> seemed the most intuitive choice. > > >> _______________________________________________ > > >> Bioperl-guts-l mailing list > > >> Bioperl-guts-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > > > > > > Christopher Fields > > > Postdoctoral Researcher > > > Lab of Dr. Robert Switzer > > > Dept of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From stewarta at nmrc.navy.mil Wed Apr 11 14:40:18 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Wed, 11 Apr 2007 14:40:18 -0400 Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer Message-ID: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil> First of all, mucho kudos to those who revamped this module. It works really nice. I have a couple thoughts.. * The .predict file from Glimmer provides frame and score information which could be parsed and included in the generated feature prediction * It'd be nice to include the orfID somewhere on the feature prediction.. maybe the seqID ? (these could be post-processed into locus_tags for those using Glimmer as a preliminary annotation tool) * Options to set the source and primary tags to something other than the default (ie) Glimmer3.X and 'transcript'. This could always be done post-Bio::Tools::Glimmer, though, of course. * This section.. elsif ( # Glimmer 2.X prediction (/^\s+(\d+)\s+ # gene num (\d+)\s+(\d+)\s+ # start, end \[([\+\-])\d{1}\s+ # strand /ox ) || # Glimmer 3.X prediction (/\w+(\d+)\s+ # orf (numeric portion) (\d+)\s+(\d+)\s+ # start, end ([\+\-])\d{1}\s+ # strand /ox)) { my ($genenum,$start,$end,$strand) = ( $1,$2,$3,$4 ); ...isn't picking up more than the last digit in the orf-number. Not sure if that's intentional. A sample of the feature output using - >gff_string shows up as ... test-pseudocontig Glimmer_3.X transcript 1018 8 . - . Group GenePrediction_1 test-pseudocontig Glimmer_3.X transcript 1134 1736 . + . Group GenePrediction_2 test-pseudocontig Glimmer_3.X transcript 1832 2596 . + . Group GenePrediction_4 test-pseudocontig Glimmer_3.X transcript 2710 3225 . + . Group GenePrediction_5 test-pseudocontig Glimmer_3.X transcript 3246 4016 . + . Group GenePrediction_6 test-pseudocontig Glimmer_3.X transcript 4177 5064 . + . Group GenePrediction_7 test-pseudocontig Glimmer_3.X transcript 5083 5673 . + . Group GenePrediction_8 test-pseudocontig Glimmer_3.X transcript 6001 7275 . + . Group GenePrediction_9 test-pseudocontig Glimmer_3.X transcript 7530 8081 . + . Group GenePrediction_0 test-pseudocontig Glimmer_3.X transcript 8785 8117 . - . Group GenePrediction_1 test-pseudocontig Glimmer_3.X transcript 9423 8788 . - . Group GenePrediction_2 test-pseudocontig Glimmer_3.X transcript 10088 9549 . - . Group GenePrediction_3 ...which was parsed originally from... orf00001 1018 8 -2 2.95 orf00002 1134 1736 +3 2.91 orf00004 1832 2596 +2 2.93 orf00005 2710 3225 +1 2.90 orf00006 3246 4016 +3 2.93 orf00007 4177 5064 +1 2.94 orf00008 5083 5673 +1 2.91 orf00009 6001 7275 +1 2.96 orf00010 7530 8081 +3 2.58 orf00011 8785 8117 -2 2.92 orf00012 9423 8788 -1 2.81 orf00013 10088 9549 -3 2.90 * It'd also be nice if you could somehow set the string that is placed in front of the orf-number in the line... '-tag' => { 'Group' => "GenePrediction_ $genenum"}, ...seeing as how these tag/values can't seem to be changed manually anymore without getting into AnnotationCollection stuff, which is no longer a simple matter of changing a tag/value string. (By the way, where can I find a list of AnnotationCollectionI compliant objects?) Any thoughts on the suggestions? (I don't mind taking a stab at incorporating them into the code.. I've never submitted anything to BioPerl before) -Andrew -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From cjfields at uiuc.edu Wed Apr 11 15:53:54 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 11 Apr 2007 14:53:54 -0500 Subject: [Bioperl-l] Odd spamming on bioperl wiki Message-ID: I'm posting this to the mail list in case anyone has any ideas on what is going on... I have noticed an odd (read: annoying) rash of spam on the wiki. Jason also ran some spam reversions, so maybe he can chime in. Essentially it looks like the responsible spambots 'correct' the wiki text and links, so that '+' is being removed and URI-encoded symbols in links are reverted to symbols. Unfortunately the removal occurs in all text, so places where '+' is intended (for instance, raw text for showing example record formats) are also changed. My guess is we'll need to block the IP address or add to the blacklist if possible. Between Jason and I we have blocked ~9 spambots and counting. Couldn't find anything via Google yet... chris From torsten.seemann at infotech.monash.edu.au Wed Apr 11 20:33:02 2007 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 12 Apr 2007 10:33:02 +1000 Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer In-Reply-To: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil> References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil> Message-ID: Andrew, > # Glimmer 3.X prediction > (/\w+(\d+)\s+ # orf (numeric portion) > ...isn't picking up more than the last digit in the orf-number. Not > sure if that's intentional. A sample of the feature output using - > >gff_string shows up as ... I think that regexp should be \w+?(\d+) ie. the \w+ should be non-greedy, otherwise it will swallow up all but one of the following \d+ (as \d is a subset of \w) I've CC:ed this to Mark Johnson who made the recent changes to this module. Thanks for your feedback, --Torsten Seemann From spiros at lokku.com Wed Apr 11 21:08:47 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Thu, 12 Apr 2007 02:08:47 +0100 Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More In-Reply-To: References: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> <461CF0F3.1010708@sheffield.ac.uk> Message-ID: Good idea Chris. Just got back home so will probably do it tomorrow morning or so. Spiros On 4/11/07, Chris Fields wrote: > We should probably place something on the wiki to prevent overlaps > (i.e. make sure no two devs are working on the same tests). I > planned on working on the G's last night but got bogged down. > > Spiros, if you haven't already go ahead and create a list on a wiki > page for tracking. We can lay claim to them by tagging with our sigs > and cross them off once complete. > > chris > > On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote: > > > Yep! I have some rough stats I have at home, I will post them later on > > tonight. Roughly, if i remember correctly, 50% of the tests were still > > using Test, all the others were using Test::More. > > > > More to follow later on, > > Spiros > > > > On 4/11/07, Nathan Haigh wrote: > >> It should be easy enough to find those t/*.t files that have "use > >> Test;" > >> or "require Test;" This should provide a list of files still > >> needing to > >> be converted over to Test::More. As discussed previously, it may > >> also be > >> useful to use Test::Exception to test for situations where > >> exceptions/warnings are thrown. If you add additional tests using > >> this > >> module, you should add the Test::Exception module to t/lib/ > >> > >> Good luck, and feel free to mail the list with questions/comments > >> etc. > >> > >> Nath > >> > >> > >> Chris Fields wrote: > >> > At the moment we do not have a comprehensive list up on the > >> wiki. I > >> > have been slowly working (alphabetically!) to switch them over, so > >> > any help would be appreciated. > >> > > >> > I have CC'd this to the main mail list for anyone else interested. > >> > > >> > chris > >> > > >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote: > >> > > >> > > >> >> Hey guys, > >> >> > >> >> I noticed there's an open task regarding moving testing code to > >> use > >> >> Test::More etc and that Chris and Nathan are already on to it. Is > >> >> there any kind of wiki page that you keep track of which > >> modules you > >> >> are already working on? I am new to this and want to contribute, > >> >> having a fair amount of unit testing from work, but don't want > >> to step > >> >> over other people's work and avoid duplication as well. > >> >> Any pointers where i could get started would be much > >> appreciated :-) > >> >> > >> >> Thanks, > >> >> Spiros > >> >> > >> >> ps. apologies if this is not the correct list to post this, just > >> >> seemed the most intuitive choice. > >> >> _______________________________________________ > >> >> Bioperl-guts-l mailing list > >> >> Bioperl-guts-l at lists.open-bio.org > >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > >> >> > >> > > >> > Christopher Fields > >> > Postdoctoral Researcher > >> > Lab of Dr. Robert Switzer > >> > Dept of Biochemistry > >> > University of Illinois Urbana-Champaign > >> > > >> > > >> > > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > >> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From Kevin.M.Brown at asu.edu Thu Apr 12 11:24:15 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 12 Apr 2007 08:24:15 -0700 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: <461D0407.8050105@watson.wustl.edu> References: <200704111114.27839.heikki@sanbi.ac.za> <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> <461D0407.8050105@watson.wustl.edu> Message-ID: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu> > >> What is going on here? Can anyone remember doing this? > >> Please can I ask what is the purpose of the line @pos = > sort @pos; in > >> the select_noncont subroutine of SimpleAlign.pm. > >> > >> > >> > >> In previous versions this line was not present and I could use the > >> function to reorder the alignment e.g in an alignment with 5 > >> sequences I could reorder it to put the second sequence last using > >> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but > >> even if the idea is to sort numerically this dos not work > since the > >> sort function as is will put 10 before 2, so that > >> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the > sequences in > >> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 . > >> > > > > Not sure why 10 would come before 2 since perl would interpret that > > list as a series of integers even if they were entered as > strings and > > do the sort. > > > > > Because, according to the documentation for Perl's sort > function, sorting occurs "in standard string comparison > order" unless the user specifies another comparison function to use. OK, guess I never realized that since I've used just "sort @array" and gotten things back how I expected them to be. From bix at sendu.me.uk Thu Apr 12 11:58:53 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 12 Apr 2007 16:58:53 +0100 Subject: [Bioperl-l] Fwd: SimpleAlign bug? In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu> References: <200704111114.27839.heikki@sanbi.ac.za> <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu> <461D0407.8050105@watson.wustl.edu> <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu> Message-ID: <461E573D.1060906@sendu.me.uk> Kevin Brown wrote: >> Because, according to the documentation for Perl's sort >> function, sorting occurs "in standard string comparison >> order" unless the user specifies another comparison function to use. > > OK, guess I never realized that since I've used just "sort @array" and > gotten things back how I expected them to be. If you were sorting numbers, getting the order wrong either didn't matter or you didn't notice the problem. Not realizing sort won't do what you expect in this case is a common source of bugs. It might be worth it for you (and anyone else) to go through your old code to make sure you haven't been bitten. From johnsonm at gmail.com Thu Apr 12 13:26:33 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 12 Apr 2007 12:26:33 -0500 Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer In-Reply-To: References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil> Message-ID: I'd call that a buggy regexp. Sounds like a good (but minimal) fix. Torsten, I don't have cvs write access, I think you do, can you fix that up? Andrew, can you file that as a bug: http://bugzilla.bioperl.org/ Everything else sounds like enhancements. I'm not necessarily opposed, but a little discussion is probably in order before putting any tickets in for any of that. Also, I'm not sure when I'll be able to spare some time to work on the module. It was easy to justify spending time from my day job getting the module up to where is now, as I needed a BioPerl-ish glimmer2/glimmer3 parser. It's working quite well for my purposes. Again, I'm not opposed to further enhancements, but If I'm going to work on any of them, they'll have to fit into everything else I'm doing and it could be a while. However, there's no reason somebody else can't do what I did. Discuss the changes here, work out a plan, implement it, send along the diff(s) attached to a bug in bugzilla. Next thing you know, your changes are in cvs. 8) On 4/11/07, Torsten Seemann wrote: > Andrew, > > > # Glimmer 3.X prediction > > (/\w+(\d+)\s+ # orf (numeric portion) > > ...isn't picking up more than the last digit in the orf-number. Not > > sure if that's intentional. A sample of the feature output using - > > >gff_string shows up as ... > > I think that regexp should be \w+?(\d+) > > ie. the \w+ should be non-greedy, otherwise it will swallow up all but > one of the following \d+ (as \d is a subset of \w) > > I've CC:ed this to Mark Johnson who made the recent changes to this module. > > Thanks for your feedback, > > --Torsten Seemann From cjfields at uiuc.edu Thu Apr 12 14:11:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Apr 2007 13:11:33 -0500 Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer In-Reply-To: References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil> Message-ID: <7314C1CD-8AD5-4400-A495-6C8124833D0D@uiuc.edu> Agreed; anyone can suggest code enhancements and bug fixes and submit patches for these: http://www.bioperl.org/wiki/HOWTO:SubmitPatch You'll see a long list of unimplemented enhancement requests in Bugzilla. These are the ones where no patch is given; you'll find that very few are willing to go through the effort to work on them unless there is something in it for them! Enhancement requests that come with patches and tests tend to get committed fairly rapidly (sometimes within hours). chris On Apr 12, 2007, at 12:26 PM, Mark Johnson wrote: > I'd call that a buggy regexp. Sounds like a good (but minimal) > fix. Torsten, I don't have cvs write access, I think you do, can you > fix that up? Andrew, can you file that as a bug: > > http://bugzilla.bioperl.org/ > > Everything else sounds like enhancements. I'm not necessarily > opposed, but a little discussion is probably in order before putting > any tickets in for any of that. Also, I'm not sure when I'll be able > to spare some time to work on the module. It was easy to justify > spending time from my day job getting the module up to where is now, > as I needed a BioPerl-ish glimmer2/glimmer3 parser. It's working > quite well for my purposes. Again, I'm not opposed to further > enhancements, but If I'm going to work on any of them, they'll have to > fit into everything else I'm doing and it could be a while. However, > there's no reason somebody else can't do what I did. Discuss the > changes here, work out a plan, implement it, send along the diff(s) > attached to a bug in bugzilla. Next thing you know, your changes are > in cvs. 8) > > On 4/11/07, Torsten Seemann > wrote: >> Andrew, >> >>> # Glimmer 3.X prediction >>> (/\w+(\d+)\s+ # orf (numeric portion) >>> ...isn't picking up more than the last digit in the orf-number. Not >>> sure if that's intentional. A sample of the feature output using - >>>> gff_string shows up as ... >> >> I think that regexp should be \w+?(\d+) >> >> ie. the \w+ should be non-greedy, otherwise it will swallow up all >> but >> one of the following \d+ (as \d is a subset of \w) >> >> I've CC:ed this to Mark Johnson who made the recent changes to >> this module. >> >> Thanks for your feedback, >> >> --Torsten Seemann From stewarta at nmrc.navy.mil Thu Apr 12 14:35:00 2007 From: stewarta at nmrc.navy.mil (Andrew Stewart) Date: Thu, 12 Apr 2007 14:35:00 -0400 Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer In-Reply-To: References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil> Message-ID: I'm willing to do the coding and testing, I'm just not familiar with the submission process yet (I see there's a HOWTO now, nice). Let's discuss first. So to reiterate, I'm suggesting that the module also parse out the frame and score information from Glimmer output. I take back my suggestion of overriding the source / primary tags through the module as this can easily be done post-parser. Other annotations can also be edited post-parser easily enough. Reasons for: Parsing everything out of the output and letting the user determine what's useful or not. Reasons against: Extra information may not be relevant to the format of the generated feature type? -Andrew On Apr 12, 2007, at 1:26 PM, Mark Johnson wrote: > I'd call that a buggy regexp. Sounds like a good (but minimal) > fix. Torsten, I don't have cvs write access, I think you do, can you > fix that up? Andrew, can you file that as a bug: > > http://bugzilla.bioperl.org/ > > Everything else sounds like enhancements. I'm not necessarily > opposed, but a little discussion is probably in order before putting > any tickets in for any of that. Also, I'm not sure when I'll be able > to spare some time to work on the module. It was easy to justify > spending time from my day job getting the module up to where is now, > as I needed a BioPerl-ish glimmer2/glimmer3 parser. It's working > quite well for my purposes. Again, I'm not opposed to further > enhancements, but If I'm going to work on any of them, they'll have to > fit into everything else I'm doing and it could be a while. However, > there's no reason somebody else can't do what I did. Discuss the > changes here, work out a plan, implement it, send along the diff(s) > attached to a bug in bugzilla. Next thing you know, your changes are > in cvs. 8) > > On 4/11/07, Torsten Seemann > wrote: >> Andrew, >> >> > # Glimmer 3.X prediction >> > (/\w+(\d+)\s+ # orf (numeric portion) >> > ...isn't picking up more than the last digit in the orf-number. >> Not >> > sure if that's intentional. A sample of the feature output using - >> > >gff_string shows up as ... >> >> I think that regexp should be \w+?(\d+) >> >> ie. the \w+ should be non-greedy, otherwise it will swallow up all >> but >> one of the following \d+ (as \d is a subset of \w) >> >> I've CC:ed this to Mark Johnson who made the recent changes to >> this module. >> >> Thanks for your feedback, >> >> --Torsten Seemann -- Andrew Stewart Research Assistant, Genomics Team Navy Medical Research Center (NMRC) Biological Defense Research Directorate (BDRD) BDRD Annex 12300 Washington Avenue, 2nd Floor Rockville, MD 20852 email: stewarta at nmrc.navy.mil phone: 301-231-6700 Ext 270 From johnsonm at gmail.com Thu Apr 12 15:11:18 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 12 Apr 2007 14:11:18 -0500 Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer In-Reply-To: References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil> Message-ID: > So to reiterate, I'm suggesting that the module also parse out the frame and > score information from Glimmer output. I take back my suggestion of > overriding the source / primary tags through the module as this can easily > be done post-parser. Other annotations can also be edited post-parser > easily enough. The reason the source tags are what they are for my addition(s) is that the output from glimmer2/glimmer3 does not include a version string. You can figure out the major version from the output formatting, but that's about it. Also, being my first significant contribution, I wasn't out to break new ground. I did what some of the other gene predictors seem to do, and what the existing code already did. Maybe there should be a method to pass in the exact version if you know it. Further than that, I think the Glimmer module should stay consistent with what the other gene predictors do. No reason, though, that they couldn't *all* be enhanced similarly, if you want to be able to further control the source tag. 8) Part of the reason I didn't parse out the frame / score info for either glimmer2 or glimmer3 was that I didn't need it. The other part being that my regexp kung-fu is nothing special. This sounds like a no-brainer to me. Extend the regexps to capture it and tag it (and the tests). As far as the ORFs go, I guess you could just use Bio::SeqFeature::Generic to represent them. I haven't been keeping track of the relevant feature/annotation interfaces, but maybe there should be some kind of relation between the ORFs and predictions? The glimmer3 detail file is a little trickier. The least disruptive thing to do, interface wise, might be to specify that as a seperate input via an argument to the constructor. Then you've got *two* input files, and are going to have to override the automagic stuff that expects one input file and takes care of it all. As far as process, I just got on the list and started pestering people, and they haven't thrown me off yet. 8) I'm afraid that you're going to find that while people are happy to discuss implementation details, when it comes time to fire up the editor, you're usually on your own, if it's an enhancement. I'd love to work on Bioperl more, but so far, it's only been for what I need for my job. From spiros at lokku.com Thu Apr 12 15:16:39 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Thu, 12 Apr 2007 20:16:39 +0100 Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More In-Reply-To: References: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> <461CF0F3.1010708@sheffield.ac.uk> Message-ID: Hey guys, I have added a link as per Chris's nice suggestion for keeping track on whats going on regarding the migration: http://www.bioperl.org/wiki/TestMoreProgress There's also a link to this page from the project priority list. However, adding our signature for each module etc , in my humble opinion, seems tedious. May i suggest we just split up the list in 'starting letter sections' and each one does his part. I volunteer to work on all tests starting with the letter R down to the bottom of the list. Let me know if this makes sense or not. I will work on removing/flagging all the files that have already been migrated on that list as well. -spiros On 4/12/07, Spiros Denaxas wrote: > Good idea Chris. Just got back home so will probably do it tomorrow > morning or so. > > Spiros > > On 4/11/07, Chris Fields wrote: > > We should probably place something on the wiki to prevent overlaps > > (i.e. make sure no two devs are working on the same tests). I > > planned on working on the G's last night but got bogged down. > > > > Spiros, if you haven't already go ahead and create a list on a wiki > > page for tracking. We can lay claim to them by tagging with our sigs > > and cross them off once complete. > > > > chris > > > > On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote: > > > > > Yep! I have some rough stats I have at home, I will post them later on > > > tonight. Roughly, if i remember correctly, 50% of the tests were still > > > using Test, all the others were using Test::More. > > > > > > More to follow later on, > > > Spiros > > > > > > On 4/11/07, Nathan Haigh wrote: > > >> It should be easy enough to find those t/*.t files that have "use > > >> Test;" > > >> or "require Test;" This should provide a list of files still > > >> needing to > > >> be converted over to Test::More. As discussed previously, it may > > >> also be > > >> useful to use Test::Exception to test for situations where > > >> exceptions/warnings are thrown. If you add additional tests using > > >> this > > >> module, you should add the Test::Exception module to t/lib/ > > >> > > >> Good luck, and feel free to mail the list with questions/comments > > >> etc. > > >> > > >> Nath > > >> > > >> > > >> Chris Fields wrote: > > >> > At the moment we do not have a comprehensive list up on the > > >> wiki. I > > >> > have been slowly working (alphabetically!) to switch them over, so > > >> > any help would be appreciated. > > >> > > > >> > I have CC'd this to the main mail list for anyone else interested. > > >> > > > >> > chris > > >> > > > >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote: > > >> > > > >> > > > >> >> Hey guys, > > >> >> > > >> >> I noticed there's an open task regarding moving testing code to > > >> use > > >> >> Test::More etc and that Chris and Nathan are already on to it. Is > > >> >> there any kind of wiki page that you keep track of which > > >> modules you > > >> >> are already working on? I am new to this and want to contribute, > > >> >> having a fair amount of unit testing from work, but don't want > > >> to step > > >> >> over other people's work and avoid duplication as well. > > >> >> Any pointers where i could get started would be much > > >> appreciated :-) > > >> >> > > >> >> Thanks, > > >> >> Spiros > > >> >> > > >> >> ps. apologies if this is not the correct list to post this, just > > >> >> seemed the most intuitive choice. > > >> >> _______________________________________________ > > >> >> Bioperl-guts-l mailing list > > >> >> Bioperl-guts-l at lists.open-bio.org > > >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l > > >> >> > > >> > > > >> > Christopher Fields > > >> > Postdoctoral Researcher > > >> > Lab of Dr. Robert Switzer > > >> > Dept of Biochemistry > > >> > University of Illinois Urbana-Champaign > > >> > > > >> > > > >> > > > >> > _______________________________________________ > > >> > Bioperl-l mailing list > > >> > Bioperl-l at lists.open-bio.org > > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > >> > > >> > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > > From marian.thieme at lycos.de Wed Apr 11 12:02:14 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Wed, 11 Apr 2007 16:02:14 +0000 Subject: [Bioperl-l] Affys ReseqChip Message-ID: <188661178017404@lycos-europe.com> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/bc2eb3aa/attachment-0001.html From johnsonm at gmail.com Thu Apr 12 15:35:35 2007 From: johnsonm at gmail.com (Mark Johnson) Date: Thu, 12 Apr 2007 14:35:35 -0500 Subject: [Bioperl-l] Odd spamming on bioperl wiki In-Reply-To: References: Message-ID: Looks like MediaWiki has some built in functionality: http://meta.wikimedia.org/wiki/Anti-spam_Features http://www.mediawiki.org/wiki/Extension:ConfirmEdit I'm not sure I'd call what they're doing spam, more like vandalism, but either way, I don't see the point (though I only looked at a couple examples via Recent Changes). If they're indeed bots, maybe it's time to enable Captchas? Depending on who they are and what their goals are, that may get rid of them completely or just slow them down. On 4/11/07, Chris Fields wrote: > I'm posting this to the mail list in case anyone has any ideas on > what is going on... > > I have noticed an odd (read: annoying) rash of spam on the wiki. > Jason also ran some spam reversions, so maybe he can chime in. > Essentially it looks like the responsible spambots 'correct' the wiki > text and links, so that '+' is being removed and URI-encoded symbols > in links are reverted to symbols. Unfortunately the removal occurs > in all text, so places where '+' is intended (for instance, raw text > for showing example record formats) are also changed. My guess is > we'll need to block the IP address or add to the blacklist if possible. > > Between Jason and I we have blocked ~9 spambots and counting. > Couldn't find anything via Google yet... > > chris From cjfields at uiuc.edu Thu Apr 12 15:44:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Apr 2007 14:44:28 -0500 Subject: [Bioperl-l] Odd spamming on bioperl wiki In-Reply-To: References: Message-ID: On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote: > Looks like MediaWiki has some built in functionality: > > http://meta.wikimedia.org/wiki/Anti-spam_Features > http://www.mediawiki.org/wiki/Extension:ConfirmEdit > > I'm not sure I'd call what they're doing spam, more like vandalism, > but either way, I don't see the point (though I only looked at a > couple examples via Recent Changes). > > If they're indeed bots, maybe it's time to enable Captchas? Depending > on who they are and what their goals are, that may get rid of them > completely or just slow them down. Already done; Mauricio installed ConfirmEdit yesterday after a bit of off-list discussion (thanks again Mauricio!). If you create a new account you'll encounter a simple captcha (it isn't configured for each edit yet). We may implement confirmations per edit or install picture captchas at a later point, dep. on how well the current system works. We may start granting anyone interested in maintaining the wiki sysop privs which makes handling spam easier. If so we'll probably announce something along those lines here first. chris From cjfields at uiuc.edu Thu Apr 12 15:48:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Apr 2007 14:48:41 -0500 Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More In-Reply-To: References: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu> <461CF0F3.1010708@sheffield.ac.uk> Message-ID: <3B4500DD-CAB6-4FD6-ABF9-A0160981F7E3@uiuc.edu> Sounds good! I'll finish up the P's (halfway through now...) and move on to other things; got plenty to do, believe me! Appreciate all the help, Spiros! chris On Apr 12, 2007, at 2:16 PM, Spiros Denaxas wrote: > Hey guys, > > I have added a link as per Chris's nice suggestion for keeping track > on whats going on regarding the migration: > http://www.bioperl.org/wiki/TestMoreProgress > There's also a link to this page from the project priority list. > However, adding our signature for each module etc , in my humble > opinion, seems tedious. May i suggest we just split up the list in > 'starting letter sections' and each one does his part. > I volunteer to work on all tests starting with the letter R down to > the bottom of the list. > > Let me know if this makes sense or not. I will work on > removing/flagging all the files that have already been migrated on > that list as well. > > -spiros > > On 4/12/07, Spiros Denaxas wrote: >> Good idea Chris. Just got back home so will probably do it tomorrow >> morning or so. >> >> Spiros >> >> On 4/11/07, Chris Fields wrote: >>> We should probably place something on the wiki to prevent overlaps >>> (i.e. make sure no two devs are working on the same tests). I >>> planned on working on the G's last night but got bogged down. >>> >>> Spiros, if you haven't already go ahead and create a list on a wiki >>> page for tracking. We can lay claim to them by tagging with our >>> sigs >>> and cross them off once complete. >>> >>> chris >>> >>> On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote: >>> >>>> Yep! I have some rough stats I have at home, I will post them >>>> later on >>>> tonight. Roughly, if i remember correctly, 50% of the tests were >>>> still >>>> using Test, all the others were using Test::More. >>>> >>>> More to follow later on, >>>> Spiros >>>> >>>> On 4/11/07, Nathan Haigh wrote: >>>>> It should be easy enough to find those t/*.t files that have "use >>>>> Test;" >>>>> or "require Test;" This should provide a list of files still >>>>> needing to >>>>> be converted over to Test::More. As discussed previously, it may >>>>> also be >>>>> useful to use Test::Exception to test for situations where >>>>> exceptions/warnings are thrown. If you add additional tests using >>>>> this >>>>> module, you should add the Test::Exception module to t/lib/ >>>>> >>>>> Good luck, and feel free to mail the list with questions/comments >>>>> etc. >>>>> >>>>> Nath >>>>> >>>>> >>>>> Chris Fields wrote: >>>>>> At the moment we do not have a comprehensive list up on the >>>>> wiki. I >>>>>> have been slowly working (alphabetically!) to switch them >>>>>> over, so >>>>>> any help would be appreciated. >>>>>> >>>>>> I have CC'd this to the main mail list for anyone else >>>>>> interested. >>>>>> >>>>>> chris >>>>>> >>>>>> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote: >>>>>> >>>>>> >>>>>>> Hey guys, >>>>>>> >>>>>>> I noticed there's an open task regarding moving testing code to >>>>> use >>>>>>> Test::More etc and that Chris and Nathan are already on to >>>>>>> it. Is >>>>>>> there any kind of wiki page that you keep track of which >>>>> modules you >>>>>>> are already working on? I am new to this and want to contribute, >>>>>>> having a fair amount of unit testing from work, but don't want >>>>> to step >>>>>>> over other people's work and avoid duplication as well. >>>>>>> Any pointers where i could get started would be much >>>>> appreciated :-) >>>>>>> >>>>>>> Thanks, >>>>>>> Spiros >>>>>>> >>>>>>> ps. apologies if this is not the correct list to post this, just >>>>>>> seemed the most intuitive choice. >>>>>>> _______________________________________________ >>>>>>> Bioperl-guts-l mailing list >>>>>>> Bioperl-guts-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l >>>>>>> >>>>>> >>>>>> Christopher Fields >>>>>> Postdoctoral Researcher >>>>>> Lab of Dr. Robert Switzer >>>>>> Dept of Biochemistry >>>>>> University of Illinois Urbana-Champaign >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> >>>>> >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From spiros at lokku.com Thu Apr 12 16:19:18 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Thu, 12 Apr 2007 21:19:18 +0100 Subject: [Bioperl-l] Odd spamming on bioperl wiki In-Reply-To: References: Message-ID: Nice idea, i saw it a bit before. However, any chance of implementing white lists with regular and/or trusted users to skip it each time we add something to the wiki ? Spiros On 4/12/07, Chris Fields wrote: > > On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote: > > > Looks like MediaWiki has some built in functionality: > > > > http://meta.wikimedia.org/wiki/Anti-spam_Features > > http://www.mediawiki.org/wiki/Extension:ConfirmEdit > > > > I'm not sure I'd call what they're doing spam, more like vandalism, > > but either way, I don't see the point (though I only looked at a > > couple examples via Recent Changes). > > > > If they're indeed bots, maybe it's time to enable Captchas? Depending > > on who they are and what their goals are, that may get rid of them > > completely or just slow them down. > > Already done; Mauricio installed ConfirmEdit yesterday after a bit of > off-list discussion (thanks again Mauricio!). > > If you create a new account you'll encounter a simple captcha (it > isn't configured for each edit yet). We may implement confirmations > per edit or install picture captchas at a later point, dep. on how > well the current system works. > > We may start granting anyone interested in maintaining the wiki sysop > privs which makes handling spam easier. If so we'll probably > announce something along those lines here first. > > chris > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Jonathan_Epstein at nih.gov Thu Apr 12 16:22:40 2007 From: Jonathan_Epstein at nih.gov (Jonathan Epstein) Date: Thu, 12 Apr 2007 16:22:40 -0400 Subject: [Bioperl-l] Affys ReseqChip In-Reply-To: <188661178017404@lycos-europe.com> References: <188661178017404@lycos-europe.com> Message-ID: <6.2.3.4.2.20070412161407.04a38b60@mail.nih.gov> This sounds great to me. Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this. But I suggest that you proceed full-speed-ahead, and we can sort this out in the future. Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach. Jonathan At 12:02 PM 4/11/2007, Marian Thieme wrote: >Hi, > >I am working on a piece of software, which is aimed to analyse the outcome of Affymetrix DNA Resequencing Arrays. (In particular Mitochip V2). The main goal of the software is to take into account for the redundant fragments. The software is able to align the redundant fragments to the entire sequence and in particular to call bases which arent called by the entire sequence and to detect insertions/deletion, depending on the design of the redundant frags. > >I would be glad to distribute the software to the bioperl package or otherwise, if anybody is interested I can give the code and/or further develop some features. > >Marian Jonathan Epstein Jonathan_Epstein at nih.gov Head, Unit on Biologic Computation (301)402-4563 Office of the Scientific Director Bldg 31, Room 2A47 Nat. Inst. of Child Health & Human Development 31 Center Drive National Institutes of Health Bethesda, MD 20892 From spiros at lokku.com Thu Apr 12 17:35:43 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Thu, 12 Apr 2007 22:35:43 +0100 Subject: [Bioperl-l] Odd spamming on bioperl wiki In-Reply-To: <461EA4FA.8010504@campus.iztacala.unam.mx> References: <461EA4FA.8010504@campus.iztacala.unam.mx> Message-ID: Mauricio, thanks for your response. I actually edited a page several times today and i got the captcha. More specifically, it was displayed because "the page i edited contained external links" which is true since i included a {{CPAN}} link. Spiros On 4/12/07, Mauricio Herrera Cuadra wrote: > The chance of having white lists exists but as far as I tested last > night, the captcha is working only at the Create Account pages, not at > the time of applying changes to wiki content (I tested as a regular user > and not as a wiki admin). > > The idea at this moment is only to block automated methods for account > creation (bots). Registered users who haven't been blocked and/or have > confirmed their email wouldn't be bothered while adding/changing wiki > content. > > Regards, > Mauricio. > > Spiros Denaxas wrote: > > Nice idea, i saw it a bit before. However, any chance of implementing > > white lists with regular and/or trusted users to skip it each time we > > add something to the wiki ? > > > > Spiros > > > > On 4/12/07, Chris Fields wrote: > >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote: > >> > >>> Looks like MediaWiki has some built in functionality: > >>> > >>> http://meta.wikimedia.org/wiki/Anti-spam_Features > >>> http://www.mediawiki.org/wiki/Extension:ConfirmEdit > >>> > >>> I'm not sure I'd call what they're doing spam, more like vandalism, > >>> but either way, I don't see the point (though I only looked at a > >>> couple examples via Recent Changes). > >>> > >>> If they're indeed bots, maybe it's time to enable Captchas? Depending > >>> on who they are and what their goals are, that may get rid of them > >>> completely or just slow them down. > >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of > >> off-list discussion (thanks again Mauricio!). > >> > >> If you create a new account you'll encounter a simple captcha (it > >> isn't configured for each edit yet). We may implement confirmations > >> per edit or install picture captchas at a later point, dep. on how > >> well the current system works. > >> > >> We may start granting anyone interested in maintaining the wiki sysop > >> privs which makes handling spam easier. If so we'll probably > >> announce something along those lines here first. > >> > >> chris > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > From arareko at campus.iztacala.unam.mx Thu Apr 12 17:30:34 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 12 Apr 2007 16:30:34 -0500 Subject: [Bioperl-l] Odd spamming on bioperl wiki In-Reply-To: References: Message-ID: <461EA4FA.8010504@campus.iztacala.unam.mx> The chance of having white lists exists but as far as I tested last night, the captcha is working only at the Create Account pages, not at the time of applying changes to wiki content (I tested as a regular user and not as a wiki admin). The idea at this moment is only to block automated methods for account creation (bots). Registered users who haven't been blocked and/or have confirmed their email wouldn't be bothered while adding/changing wiki content. Regards, Mauricio. Spiros Denaxas wrote: > Nice idea, i saw it a bit before. However, any chance of implementing > white lists with regular and/or trusted users to skip it each time we > add something to the wiki ? > > Spiros > > On 4/12/07, Chris Fields wrote: >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote: >> >>> Looks like MediaWiki has some built in functionality: >>> >>> http://meta.wikimedia.org/wiki/Anti-spam_Features >>> http://www.mediawiki.org/wiki/Extension:ConfirmEdit >>> >>> I'm not sure I'd call what they're doing spam, more like vandalism, >>> but either way, I don't see the point (though I only looked at a >>> couple examples via Recent Changes). >>> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending >>> on who they are and what their goals are, that may get rid of them >>> completely or just slow them down. >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of >> off-list discussion (thanks again Mauricio!). >> >> If you create a new account you'll encounter a simple captcha (it >> isn't configured for each edit yet). We may implement confirmations >> per edit or install picture captchas at a later point, dep. on how >> well the current system works. >> >> We may start granting anyone interested in maintaining the wiki sysop >> privs which makes handling spam easier. If so we'll probably >> announce something along those lines here first. >> >> chris >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From arareko at campus.iztacala.unam.mx Thu Apr 12 17:53:51 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 12 Apr 2007 16:53:51 -0500 Subject: [Bioperl-l] Odd spamming on bioperl wiki In-Reply-To: References: <461EA4FA.8010504@campus.iztacala.unam.mx> Message-ID: <461EAA6F.1090805@campus.iztacala.unam.mx> I've reconfigured the extension to display captchas exclusively for account creation and disabled it when adding URLs in pages. Don't know why this didn't happened to me while testing last night... Please try do it again to see if the change works. Thanks for pointing this out Spiros :) Mauricio. Spiros Denaxas wrote: > Mauricio, thanks for your response. I actually edited a page several > times today and i got the captcha. More specifically, it was displayed > because "the page i edited contained external links" which is true > since i included a {{CPAN}} link. > > Spiros > > On 4/12/07, Mauricio Herrera Cuadra > wrote: >> The chance of having white lists exists but as far as I tested last >> night, the captcha is working only at the Create Account pages, not at >> the time of applying changes to wiki content (I tested as a regular user >> and not as a wiki admin). >> >> The idea at this moment is only to block automated methods for account >> creation (bots). Registered users who haven't been blocked and/or have >> confirmed their email wouldn't be bothered while adding/changing wiki >> content. >> >> Regards, >> Mauricio. >> >> Spiros Denaxas wrote: >> > Nice idea, i saw it a bit before. However, any chance of implementing >> > white lists with regular and/or trusted users to skip it each time we >> > add something to the wiki ? >> > >> > Spiros >> > >> > On 4/12/07, Chris Fields wrote: >> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote: >> >> >> >>> Looks like MediaWiki has some built in functionality: >> >>> >> >>> http://meta.wikimedia.org/wiki/Anti-spam_Features >> >>> http://www.mediawiki.org/wiki/Extension:ConfirmEdit >> >>> >> >>> I'm not sure I'd call what they're doing spam, more like vandalism, >> >>> but either way, I don't see the point (though I only looked at a >> >>> couple examples via Recent Changes). >> >>> >> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending >> >>> on who they are and what their goals are, that may get rid of them >> >>> completely or just slow them down. >> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of >> >> off-list discussion (thanks again Mauricio!). >> >> >> >> If you create a new account you'll encounter a simple captcha (it >> >> isn't configured for each edit yet). We may implement confirmations >> >> per edit or install picture captchas at a later point, dep. on how >> >> well the current system works. >> >> >> >> We may start granting anyone interested in maintaining the wiki sysop >> >> privs which makes handling spam easier. If so we'll probably >> >> announce something along those lines here first. >> >> >> >> chris >> >> >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> >> -- >> MAURICIO HERRERA CUADRA >> arareko at campus.iztacala.unam.mx >> Laboratorio de Gen?tica >> Unidad de Morfofisiolog?a y Funci?n >> Facultad de Estudios Superiores Iztacala, UNAM >> >> > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From spiros at lokku.com Thu Apr 12 18:11:46 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Thu, 12 Apr 2007 23:11:46 +0100 Subject: [Bioperl-l] Odd spamming on bioperl wiki In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx> References: <461EA4FA.8010504@campus.iztacala.unam.mx> <461EAA6F.1090805@campus.iztacala.unam.mx> Message-ID: You're welcome Mauricio. Its all cool now, works without the captcha for internal edits. Thanks for changing it over :-) -spiros On 4/12/07, Mauricio Herrera Cuadra wrote: > I've reconfigured the extension to display captchas exclusively for > account creation and disabled it when adding URLs in pages. Don't know > why this didn't happened to me while testing last night... > > Please try do it again to see if the change works. Thanks for pointing > this out Spiros :) > > Mauricio. > > Spiros Denaxas wrote: > > Mauricio, thanks for your response. I actually edited a page several > > times today and i got the captcha. More specifically, it was displayed > > because "the page i edited contained external links" which is true > > since i included a {{CPAN}} link. > > > > Spiros > > > > On 4/12/07, Mauricio Herrera Cuadra > > wrote: > >> The chance of having white lists exists but as far as I tested last > >> night, the captcha is working only at the Create Account pages, not at > >> the time of applying changes to wiki content (I tested as a regular user > >> and not as a wiki admin). > >> > >> The idea at this moment is only to block automated methods for account > >> creation (bots). Registered users who haven't been blocked and/or have > >> confirmed their email wouldn't be bothered while adding/changing wiki > >> content. > >> > >> Regards, > >> Mauricio. > >> > >> Spiros Denaxas wrote: > >> > Nice idea, i saw it a bit before. However, any chance of implementing > >> > white lists with regular and/or trusted users to skip it each time we > >> > add something to the wiki ? > >> > > >> > Spiros > >> > > >> > On 4/12/07, Chris Fields wrote: > >> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote: > >> >> > >> >>> Looks like MediaWiki has some built in functionality: > >> >>> > >> >>> http://meta.wikimedia.org/wiki/Anti-spam_Features > >> >>> http://www.mediawiki.org/wiki/Extension:ConfirmEdit > >> >>> > >> >>> I'm not sure I'd call what they're doing spam, more like vandalism, > >> >>> but either way, I don't see the point (though I only looked at a > >> >>> couple examples via Recent Changes). > >> >>> > >> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending > >> >>> on who they are and what their goals are, that may get rid of them > >> >>> completely or just slow them down. > >> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of > >> >> off-list discussion (thanks again Mauricio!). > >> >> > >> >> If you create a new account you'll encounter a simple captcha (it > >> >> isn't configured for each edit yet). We may implement confirmations > >> >> per edit or install picture captchas at a later point, dep. on how > >> >> well the current system works. > >> >> > >> >> We may start granting anyone interested in maintaining the wiki sysop > >> >> privs which makes handling spam easier. If so we'll probably > >> >> announce something along those lines here first. > >> >> > >> >> chris > >> >> > >> >> > >> >> _______________________________________________ > >> >> Bioperl-l mailing list > >> >> Bioperl-l at lists.open-bio.org > >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> >> > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > >> > >> -- > >> MAURICIO HERRERA CUADRA > >> arareko at campus.iztacala.unam.mx > >> Laboratorio de Gen?tica > >> Unidad de Morfofisiolog?a y Funci?n > >> Facultad de Estudios Superiores Iztacala, UNAM > >> > >> > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > From cjfields at uiuc.edu Thu Apr 12 18:02:51 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 12 Apr 2007 17:02:51 -0500 Subject: [Bioperl-l] Odd spamming on bioperl wiki In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx> References: <461EA4FA.8010504@campus.iztacala.unam.mx> <461EAA6F.1090805@campus.iztacala.unam.mx> Message-ID: You disabled yourself as sysop last night, IIRC. Don't know; could be what Spiros suggested, eg. adding external links trips it. chris On Apr 12, 2007, at 4:53 PM, Mauricio Herrera Cuadra wrote: > I've reconfigured the extension to display captchas exclusively for > account creation and disabled it when adding URLs in pages. Don't > know why this didn't happened to me while testing last night... > > Please try do it again to see if the change works. Thanks for > pointing this out Spiros :) > > Mauricio. > > Spiros Denaxas wrote: >> Mauricio, thanks for your response. I actually edited a page several >> times today and i got the captcha. More specifically, it was >> displayed >> because "the page i edited contained external links" which is true >> since i included a {{CPAN}} link. >> Spiros >> On 4/12/07, Mauricio Herrera Cuadra >> wrote: >>> The chance of having white lists exists but as far as I tested last >>> night, the captcha is working only at the Create Account pages, >>> not at >>> the time of applying changes to wiki content (I tested as a >>> regular user >>> and not as a wiki admin). >>> >>> The idea at this moment is only to block automated methods for >>> account >>> creation (bots). Registered users who haven't been blocked and/or >>> have >>> confirmed their email wouldn't be bothered while adding/changing >>> wiki >>> content. >>> >>> Regards, >>> Mauricio. >>> >>> Spiros Denaxas wrote: >>> > Nice idea, i saw it a bit before. However, any chance of >>> implementing >>> > white lists with regular and/or trusted users to skip it each >>> time we >>> > add something to the wiki ? >>> > >>> > Spiros >>> > >>> > On 4/12/07, Chris Fields wrote: >>> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote: >>> >> >>> >>> Looks like MediaWiki has some built in functionality: >>> >>> >>> >>> http://meta.wikimedia.org/wiki/Anti-spam_Features >>> >>> http://www.mediawiki.org/wiki/Extension:ConfirmEdit >>> >>> >>> >>> I'm not sure I'd call what they're doing spam, more like >>> vandalism, >>> >>> but either way, I don't see the point (though I only looked at a >>> >>> couple examples via Recent Changes). >>> >>> >>> >>> If they're indeed bots, maybe it's time to enable Captchas? >>> Depending >>> >>> on who they are and what their goals are, that may get rid of >>> them >>> >>> completely or just slow them down. >>> >> Already done; Mauricio installed ConfirmEdit yesterday after a >>> bit of >>> >> off-list discussion (thanks again Mauricio!). >>> >> >>> >> If you create a new account you'll encounter a simple captcha (it >>> >> isn't configured for each edit yet). We may implement >>> confirmations >>> >> per edit or install picture captchas at a later point, dep. on >>> how >>> >> well the current system works. >>> >> >>> >> We may start granting anyone interested in maintaining the >>> wiki sysop >>> >> privs which makes handling spam easier. If so we'll probably >>> >> announce something along those lines here first. >>> >> >>> >> chris >>> >> >>> >> >>> >> _______________________________________________ >>> >> Bioperl-l mailing list >>> >> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > >>> >>> -- >>> MAURICIO HERRERA CUADRA >>> arareko at campus.iztacala.unam.mx >>> Laboratorio de Gen?tica >>> Unidad de Morfofisiolog?a y Funci?n >>> Facultad de Estudios Superiores Iztacala, UNAM >>> >>> > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Apr 13 04:30:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 13 Apr 2007 09:30:50 +0100 Subject: [Bioperl-l] GenericHit->start/end needs tiled hsps? Message-ID: <461F3FBA.2010101@sendu.me.uk> Hi all, I want to double-check my thinking regarding Bio::Search::Hit::GenericHit->start() and end(). Right now the docs claim that hsps of the hit object must be tiled before the answer can be produced. The code is implemented in that way (Bio::Search::SearchUtils::tile_hsps($self)). Yet as far as I can see, all you need to do is loop through all hsps and pick out the smallest start and largest end respectively in terms of subject and query. This comes up because I have a blast report where a single hit contains over 80000 hsps and the tiling takes over an hour (I gave up on it, don't know how long it really takes). The simple loop through hsps takes seconds or less. Now in this situation the answer isn't especially useful (to me). An alternative way of fixing the problem would be to re-write the tiling algorithm (again) to somehow make it hundreds of times faster, then provide some way in start() and end() for the user to request the start and end of the best contig, or other contig of choice. Easier said than done though! What do people think? From marian.thieme at lycos.de Fri Apr 13 06:12:51 2007 From: marian.thieme at lycos.de (Marian Thieme) Date: Fri, 13 Apr 2007 10:12:51 +0000 Subject: [Bioperl-l] Affys ReseqChip Message-ID: <18866117804894@lycos-europe.com> Hi, To provide a better understanding of the matter and to assess the approach I will shortly present 1.) the problem and 2.) my approach. 1.) given: fragments (string of certain length) with description of location within some reference sequence. For instance: - redundant fragment: acgtnna--gcta (deletion: pos12, pos13) - start position: 5 - end position: 17 - and some suited reference sequence Fragments are assumed to be mappable 1:1 to reference sequence and can contain gaps and n's, the latter indicates that the base wasnt determined maybe because of failed hybridization or something like this. Thus we dont need to cope with insertions/deletions in terms of only parsing an array design file (description of all insertions and deletions in each redundant fragment) and according to that description inserting gaps in the reference sequence and in the fragments if required. So from my point of view and in the case of the affy mitochip v2 we only need to process the description file rather than calculating an alignment via dynamic programming matrix. 2.) My current approach is like the following 5 steps: 1.) input reference sequence and redundant fragments into SeqIO object. 2.) calculate a hash with all insertions defined by length and position and 3.) insert the longest insertion of each position in the appropriate fragments and in the reference sequence. And hence insert as many gaps as given by length(max_insertion(position_x))-length(insertion(fragment_y, position _x)) to each fragment/reference sequence. (This is done by iterating over each sequence in the SeqIO and insert gaps according to insertion hash) and 4.) Create SimpleAlign object with LocatableSeq objects 5.) Afterwards we can do some statistical analysis and calc some consensus base for each column in the SimpleAlignment. (I use a Statistics module from cpan). Unfortunatly I didnt manage to find some method that is giving me the set of bases (column) for a given position in the alignment (did I overlooked something ? is SimpleAlign not appropriate? ), so I iterate for each position (base) of the reference sequence and for each fragments which covers that particular position. Marian Jonathan Epstein schrieb: > This sounds great to me. > > Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this. But I suggest that you proceed full-speed-ahead, and we can sort this out in the future. > > Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach. > > Jonathan Stelle Deine Fragen bei Lycos iQ - http://iq.lycos.de/qa/ask/ From thiago.venancio at gmail.com Fri Apr 13 15:05:12 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Fri, 13 Apr 2007 16:05:12 -0300 Subject: [Bioperl-l] extracting coding sequence from BLAST Message-ID: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com> Hi all. What is the best way to extract coding region from a nucleotide sequence based on a BLASTX or TBLASTX comparisons ? Thanks in advance. Thiago -- "The way to get started is to quit talking and begin doing." Walt Disney ======================== Thiago Motta Venancio, MSc PhD student in Bioinformatics University of Sao Paulo ======================== From jason at bioperl.org Fri Apr 13 16:05:42 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Apr 2007 13:05:42 -0700 Subject: [Bioperl-l] extracting coding sequence from BLAST In-Reply-To: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com> References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com> Message-ID: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org> Depends on how far away the query protein is, but I don't trust BLAST for the actual alignment. Find the boundaries, add a little slop, and refine the alignment of protein to genome with a good alignment program designed to like genewise or exonerate or even FASTX/Y. -jason On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote: > Hi all. > > What is the best way to extract coding region from a nucleotide > sequence > based on a BLASTX or TBLASTX comparisons ? > > Thanks in advance. > > Thiago > -- > "The way to get started is to quit talking and begin doing." > Walt Disney > > ======================== > Thiago Motta Venancio, MSc > PhD student in Bioinformatics > University of Sao Paulo > ======================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From jason at bioperl.org Fri Apr 13 16:13:07 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Apr 2007 13:13:07 -0700 Subject: [Bioperl-l] rpsblast results unsupported by Bio::SearchIO::Writer In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr> References: <46028EA0.7070901@crs4.it> <8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr> <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr> Message-ID: <7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org> I think it just needs an edit the code in the to_string which checks for the type of algorithm. You'd need to add to the if/elsif cascade and add something for the RPSBLAST type and codes the query and target dbs and query and target sequence types properly. This would be very trivial to code in, have you tried adding this to see if it works? if you submit a bug with and example report we'd be able to make appropriate changes faster. -jason On Apr 11, 2007, at 6:32 AM, Emeric Sevin wrote: > Hi everybody, > > I'm sorry to bug, but either I missed something so obvious nobody > bothered to answer, either I'm being a little boycotted here... > A little help would be very much appreciated > > Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit : > >> Hello, >> >> I am new to this community, and apologize if this subject has been >> posted before. >> >> I want to print out only selected results from a multiple blast- >> alignments results file. Problem is, the algorithm used is >> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the >> actual writing task yields "unclean" warnings. Although an ouput >> is actually written, the writer >> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by >> the fact rpsblast DBs are not labeled with >> "protein"/"nucleic"/"translated". >> Does anybody know of an easy fix to that bug, or of another way to >> come around it? >> >> Thank you very much >> >> Emeric SEVIN >> Universit? de Rennes 1_______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From thiago.venancio at gmail.com Fri Apr 13 16:20:32 2007 From: thiago.venancio at gmail.com (Thiago Venancio) Date: Fri, 13 Apr 2007 17:20:32 -0300 Subject: [Bioperl-l] extracting coding sequence from BLAST In-Reply-To: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org> References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com> <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org> Message-ID: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com> Thanks Jason. I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX comparisons and want to extract some translated coding regions for further multiple aligmnent and phylogenetic analysis. Best. Thiago On 4/13/07, Jason Stajich wrote: > > Depends on how far away the query protein is, but I don't trust BLAST for > the actual alignment. Find the boundaries, add a little slop, and refine > the alignment of protein to genome with a good alignment program designed to > like genewise or exonerate or even FASTX/Y. > -jason > On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote: > > Hi all. > > What is the best way to extract coding region from a nucleotide sequence > based on a BLASTX or TBLASTX comparisons ? > > Thanks in advance. > > Thiago > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > From jason at bioperl.org Fri Apr 13 16:47:50 2007 From: jason at bioperl.org (Jason Stajich) Date: Fri, 13 Apr 2007 13:47:50 -0700 Subject: [Bioperl-l] extracting coding sequence from BLAST In-Reply-To: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com> References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com> <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org> <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com> Message-ID: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org> Hi - There are some tools that do this for you -- I've listed a few from a google search or from what I remember reading. It would be great If you (and others!) are willing to contribute a little of the info of what you find that works for you to the wiki, that would be great as well. A little HOWTO would be cool - here or on openwetware.org. Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml EST-PAC: doi: http://dx.doi.org/10.1186/1751-0473-1-2 Ewan Birney's estwise as part of wise package also can help if you have a likely protein from BLAST you want to align to the est - estwise can handle frameshifts, but can be too slow for some people. Exonerate's protein2dna model may also work here, but I haven't tried it. -jason On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote: > Thanks Jason. > > I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX > comparisons and want to extract some translated coding regions for > further > multiple aligmnent and phylogenetic analysis. > > Best. > > Thiago > > On 4/13/07, Jason Stajich wrote: >> >> Depends on how far away the query protein is, but I don't trust >> BLAST for >> the actual alignment. Find the boundaries, add a little slop, and >> refine >> the alignment of protein to genome with a good alignment program >> designed to >> like genewise or exonerate or even FASTX/Y. >> -jason >> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote: >> >> Hi all. >> >> What is the best way to extract coding region from a nucleotide >> sequence >> based on a BLASTX or TBLASTX comparisons ? >> >> Thanks in advance. >> >> Thiago >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> Jason Stajich >> jason at bioperl.org >> http://jason.open-bio.org/ >> >> >> -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From gopu_36 at yahoo.com Fri Apr 13 12:48:48 2007 From: gopu_36 at yahoo.com (gopu_36) Date: Fri, 13 Apr 2007 09:48:48 -0700 (PDT) Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score Message-ID: <9982570.post@talk.nabble.com> Can anyone help me to collect the value of the second best hit score (ie)raw_score from the blast results which contains multiple queries? I have used searchIO object to parse my blast report. I am only interested in the second best hit/raw_score and not the first hit! Thanks in advance! -- View this message in context: http://www.nabble.com/How-to-parse-blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jason at bioperl.org Sat Apr 14 13:53:42 2007 From: jason at bioperl.org (Jason Stajich) Date: Sat, 14 Apr 2007 10:53:42 -0700 Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score In-Reply-To: <9982570.post@talk.nabble.com> References: <9982570.post@talk.nabble.com> Message-ID: <67974DCD-B1F9-4286-86A4-5E4C4DBA3914@bioperl.org> Try reading the HOWTO. http://bioperl.org/wiki/HOWTO:SearchIO On Apr 13, 2007, at 9:48 AM, gopu_36 wrote: > > Can anyone help me to collect the value of the second best hit score > (ie)raw_score from the blast results which contains multiple > queries? I have > used searchIO object to parse my blast report. I am only interested > in the > second best hit/raw_score and not the first hit! > > Thanks in advance! > > > -- > View this message in context: http://www.nabble.com/How-to-parse- > blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570 > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment.bin From gdorjee at hotmail.com Sat Apr 14 17:39:50 2007 From: gdorjee at hotmail.com (DeeGee) Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT) Subject: [Bioperl-l] error while remote blast against swissprot db Message-ID: <9997343.post@talk.nabble.com> hi all, can anyone please tell me why the following script gives me error like: waiting... 5 units of time Can't call method "database_name" on an undefined value at test1_remote_swissblast.pl line 41, line 31. cheers! use Bio::SeqIO; use Bio::Tools::Run::RemoteBlast; my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], -format => 'fasta'); my $query = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::RemoteBlast->new( '-prog' => 'blastp', '-data' => 'swissprot', _READMETHOD => "Blast" ); my $blast_report = $factory->submit_blast($query); my $max_number = 100; my $trial = 0; while ( my @rids = $factory->each_rid ) { print STDERR "\nSorry, maximum number of retries $max_number exceeded\n" if $trial >= $max_number; last if $trial >= $max_number; $trial++; print STDERR "waiting... ".(5*$trial)." units of time\n" ; # RID = Remote Blast ID (e.g: 1017772174-16400-6638) foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { # retrieve_blast returns -1 on error $factory->remove_rid($rid); } # retrieve_blast returns 0 on 'job not finished' sleep 5*$trial; } else { #---- Blast done ---- $factory->remove_rid($rid); my $result = $rc->next_result; print "database: ", $result->database_name(), "\n"; while( my $hit = $result->next_hit ) { print "hit name is: ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "score is: ", $hsp->score, "\n"; } } } } } -- View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gdorjee at hotmail.com Sat Apr 14 17:39:50 2007 From: gdorjee at hotmail.com (DeeGee) Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT) Subject: [Bioperl-l] error while remote blast against swissprot db Message-ID: <9997343.post@talk.nabble.com> hi all, can anyone please tell me why and how can i fix the following script, which gives me an error like: waiting... 5 units of time Can't call method "database_name" on an undefined value at test1_remote_swissblast.pl line 41, line 31. cheers! use Bio::SeqIO; use Bio::Tools::Run::RemoteBlast; my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], -format => 'fasta'); my $query = $Seq_in->next_seq(); my $factory = Bio::Tools::Run::RemoteBlast->new( '-prog' => 'blastp', '-data' => 'swissprot', _READMETHOD => "Blast" ); my $blast_report = $factory->submit_blast($query); my $max_number = 100; my $trial = 0; while ( my @rids = $factory->each_rid ) { print STDERR "\nSorry, maximum number of retries $max_number exceeded\n" if $trial >= $max_number; last if $trial >= $max_number; $trial++; print STDERR "waiting... ".(5*$trial)." units of time\n" ; # RID = Remote Blast ID (e.g: 1017772174-16400-6638) foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { # retrieve_blast returns -1 on error $factory->remove_rid($rid); } # retrieve_blast returns 0 on 'job not finished' sleep 5*$trial; } else { #---- Blast done ---- $factory->remove_rid($rid); my $result = $rc->next_result; print "database: ", $result->database_name(), "\n"; while( my $hit = $result->next_hit ) { print "hit name is: ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "score is: ", $hsp->score, "\n"; } } } } } -- View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From dmessina at wustl.edu Sun Apr 15 12:02:51 2007 From: dmessina at wustl.edu (David Messina) Date: Sun, 15 Apr 2007 11:02:51 -0500 Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: <9997343.post@talk.nabble.com> References: <9997343.post@talk.nabble.com> Message-ID: Hi DeeGee, Your script worked fine for me. Perhaps the problem is in your input fasta file? Dave % perl test.pl AAC12660.fa waiting... 5 units of time waiting... 10 units of time waiting... 15 units of time database: Non-redundant SwissProt sequences hit name is: sp|Q15750|TAB1_HUMAN score is: 2413 hit name is: sp|Q8CF89|TAB1_MOUSE score is: 2352 hit name is: sp|P49444|PP2C_PARTE score is: 159 hit name is: sp|Q6ING9|PP2CK_XENLA [...etc...] From spiros at lokku.com Sun Apr 15 12:12:05 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Sun, 15 Apr 2007 17:12:05 +0100 Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: References: <9997343.post@talk.nabble.com> Message-ID: Yep, it must be in the input file. The $result->database_name() function gets called on $result the result object. The error you get, Can't call method "database_name" on an undefined value at test1_remote_swissblast.pl line 41, line 31. means the result object is not defined thus the function fails since there are no data to operate on. Spiros On 4/15/07, David Messina wrote: > Hi DeeGee, > > Your script worked fine for me. Perhaps the problem is in your input > fasta file? > > Dave > > % perl test.pl AAC12660.fa > waiting... 5 units of time > waiting... 10 units of time > waiting... 15 units of time > database: Non-redundant SwissProt sequences > hit name is: sp|Q15750|TAB1_HUMAN > score is: 2413 > hit name is: sp|Q8CF89|TAB1_MOUSE > score is: 2352 > hit name is: sp|P49444|PP2C_PARTE > score is: 159 > hit name is: sp|Q6ING9|PP2CK_XENLA > [...etc...] > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dr.hogart at gmail.com Sun Apr 15 12:13:29 2007 From: dr.hogart at gmail.com (sergei ryazansky) Date: Sun, 15 Apr 2007 20:13:29 +0400 Subject: [Bioperl-l] error with blast parsing by searchIO Message-ID: Hello all, script (parsing blastn report) that previously had worked today "tell" me that: ------------- EXCEPTION ------------- MSG: Could not open BLASTN 2.2.13 [Nov-27-2005] : No such file or directory STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm:273 STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213 STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135 STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167 STACK toplevel parse-te-lib2.pl:3 -------------------------------------- What does it mean?? ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8 From cjfields at uiuc.edu Sun Apr 15 13:40:24 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 15 Apr 2007 12:40:24 -0500 Subject: [Bioperl-l] error with blast parsing by searchIO In-Reply-To: References: Message-ID: <460926E6-0EEA-45D9-838E-70706062857C@uiuc.edu> You have to update to bioperl 1.5.2 or CVS. BLAST parsing is broken for recent BLAST versions (> v.2.2, I believe). chris On Apr 15, 2007, at 11:13 AM, sergei ryazansky wrote: > Hello all, > > script (parsing blastn report) that previously had worked today > "tell" me > that: > > ------------- EXCEPTION ------------- > MSG: Could not open BLASTN 2.2.13 [Nov-27-2005] > : No such file or directory > STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: > 273 > STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213 > STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135 > STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167 > STACK toplevel parse-te-lib2.pl:3 > > -------------------------------------- > > What does it mean?? > > ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason at bioperl.org Sun Apr 15 14:24:56 2007 From: jason at bioperl.org (Jason Stajich) Date: Sun, 15 Apr 2007 11:24:56 -0700 Subject: [Bioperl-l] error with blast parsing by searchIO In-Reply-To: References: Message-ID: It looks like something is broken in your script as to how you are passing it a filename - it is trying to open a file called "BLASTN 2.2.13 [Nov-27-2005]". did you already open the file and are you passing data from the first line of the file to SearchIO perhaps? Sending the relevant part of your script to the list will help us diagnose the problem better. -jason On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote: > Hello all, > > script (parsing blastn report) that previously had worked today > "tell" me > that: > > ------------- EXCEPTION ------------- > MSG: Could not open BLASTN 2.2.13 [Nov-27-2005] > : No such file or directory > STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: > 273 > STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213 > STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135 > STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167 > STACK toplevel parse-te-lib2.pl:3 > > -------------------------------------- > > What does it mean?? > > ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2613 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment.bin From gdorjee at hotmail.com Sun Apr 15 20:40:22 2007 From: gdorjee at hotmail.com (DeeGee) Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT) Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: References: <9997343.post@talk.nabble.com> Message-ID: <10008507.post@talk.nabble.com> hi guys, thanks for your replies, but i still don't understand why it doesn't work. my input fasta sequence looks fine. here, take a look, >gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS AQGAVAPGPDGGGPFPPWPLG is it possible that the script is not being about to read the RemoteBlast.pm? but the thing is, i can run the standalone blast on the command line, although i've never been able the run the same with cgi module (by gettting the input from an html textarea). i don't understand. i've been trying to get the standalone running for a while now, and i also mentioned it in my previous postings....but all in vain. i haven't got over it yet. any help or an example would be much appreciated. Spiros Denaxas wrote: > > Yep, it must be in the input file. The > > $result->database_name() > > function gets called on $result the result object. > > The error you get, > > Can't call method "database_name" on an undefined value at > test1_remote_swissblast.pl line 41, line 31. > > means the result object is not defined thus the function fails since > there are no data to operate on. > > Spiros > > On 4/15/07, David Messina wrote: >> Hi DeeGee, >> >> Your script worked fine for me. Perhaps the problem is in your input >> fasta file? >> >> Dave >> >> % perl test.pl AAC12660.fa >> waiting... 5 units of time >> waiting... 10 units of time >> waiting... 15 units of time >> database: Non-redundant SwissProt sequences >> hit name is: sp|Q15750|TAB1_HUMAN >> score is: 2413 >> hit name is: sp|Q8CF89|TAB1_MOUSE >> score is: 2352 >> hit name is: sp|P49444|PP2C_PARTE >> score is: 159 >> hit name is: sp|Q6ING9|PP2CK_XENLA >> [...etc...] >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gdorjee at hotmail.com Sun Apr 15 20:40:22 2007 From: gdorjee at hotmail.com (DeeGee) Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT) Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: References: <9997343.post@talk.nabble.com> Message-ID: <10008507.post@talk.nabble.com> hi guys, thanks for your replies, but i still don't understand why it doesn't work. my input fasta sequence looks fine. here, take a look, >gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens] HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS AQGAVAPGPDGGGPFPPWPLG is it possible that the script is not being able to read the RemoteBlast.pm? but the thing is, i can run the standalone blast on the command line, although i've never been able the run the same with cgi module (by gettting the input from an html textarea). i don't understand. i've been trying to get the standalone running for a while now, and i also mentioned it in my previous postings....but all in vain. i haven't got over it yet. any help or an example would be much appreciated. Spiros Denaxas wrote: > > Yep, it must be in the input file. The > > $result->database_name() > > function gets called on $result the result object. > > The error you get, > > Can't call method "database_name" on an undefined value at > test1_remote_swissblast.pl line 41, line 31. > > means the result object is not defined thus the function fails since > there are no data to operate on. > > Spiros > > On 4/15/07, David Messina wrote: >> Hi DeeGee, >> >> Your script worked fine for me. Perhaps the problem is in your input >> fasta file? >> >> Dave >> >> % perl test.pl AAC12660.fa >> waiting... 5 units of time >> waiting... 10 units of time >> waiting... 15 units of time >> database: Non-redundant SwissProt sequences >> hit name is: sp|Q15750|TAB1_HUMAN >> score is: 2413 >> hit name is: sp|Q8CF89|TAB1_MOUSE >> score is: 2352 >> hit name is: sp|P49444|PP2C_PARTE >> score is: 159 >> hit name is: sp|Q6ING9|PP2CK_XENLA >> [...etc...] >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From dmessina at wustl.edu Sun Apr 15 22:43:06 2007 From: dmessina at wustl.edu (David Messina) Date: Sun, 15 Apr 2007 21:43:06 -0500 Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: <10008507.post@talk.nabble.com> References: <9997343.post@talk.nabble.com> <10008507.post@talk.nabble.com> Message-ID: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu> You're right, it's not the input sequence. I just tried it with your script and it worked. > is it possible that the script is not being about to read the > RemoteBlast.pm? I think the program wouldn't compile if that were the case, and your error message would be about not finding RemoteBlast.pm rather than the one you got. > but the thing is, i can run the standalone blast on the > command line, although i've never been able the run the same with > cgi module > (by gettting the input from an html textarea). i don't understand. This result really suggests that perl and Bioperl are not the issue. I'm not saying the following to give you the brushoff, but given the numerous ways in which web-based apps can fail and in which webservers can be installed, it might be best for you to find someone at your institution who can sit down with you and work through it. Dave From cjfields at uiuc.edu Sun Apr 15 23:51:05 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 15 Apr 2007 22:51:05 -0500 Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu> References: <9997343.post@talk.nabble.com> <10008507.post@talk.nabble.com> <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu> Message-ID: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu> This sounds like a similar issue that popped up a few weeks ago related to URLAPI changes for remote BLAST access. That was fixed on NCBI's end but I also added a fix to RemoteBlast in CVS that works as well. Saying that, my guess is the same as Dave's, that there are connectivity issues. What happens when you set the RemoteBlast factory to a verbosity of 1? This will spill out debugging output from the repeated queries to the NCBI server (so if there are problems they'll show up there). ... my $factory = Bio::Tools::Run::RemoteBlast->new( '-prog' => 'blastp', '-data' => 'swissprot', _READMETHOD => "Blast", -verbose => 1 # debugging output ); ... If you see the BLAST report but get the same error try using the RemoteBlast in CVS to see if it fixes the problem. chris On Apr 15, 2007, at 9:43 PM, David Messina wrote: > You're right, it's not the input sequence. I just tried it with your > script and it worked. > > >> is it possible that the script is not being about to read the >> RemoteBlast.pm? > > I think the program wouldn't compile if that were the case, and your > error message would be about not finding RemoteBlast.pm rather than > the one you got. > > >> but the thing is, i can run the standalone blast on the >> command line, although i've never been able the run the same with >> cgi module >> (by gettting the input from an html textarea). i don't understand. > > This result really suggests that perl and Bioperl are not the issue. > I'm not saying the following to give you the brushoff, but given the > numerous ways in which web-based apps can fail and in which > webservers can be installed, it might be best for you to find someone > at your institution who can sit down with you and work through it. > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dr.hogart at gmail.com Mon Apr 16 03:03:46 2007 From: dr.hogart at gmail.com (sergei ryazansky) Date: Mon, 16 Apr 2007 11:03:46 +0400 Subject: [Bioperl-l] error with blast parsing by searchIO References: Message-ID: The problem was resolved by the direct path (-file=>'d\...\input.txt') to input file in the my script. I think that Chris right and i should update my bioperl to 1.5 version. By the way, bioperl-1.5 is not accessible via ppm. Where I can download it for winXP? On Sun, 15 Apr 2007 22:24:56 +0400, Jason Stajich wrote: > It looks like something is broken in your script as to how you are > passing it a filename - it is trying to open a file called "BLASTN > 2.2.13 [Nov-27-2005]". > did you already open the file and are you passing data from the first > line of the file to SearchIO perhaps? > Sending the relevant part of your script to the list will help us > diagnose the problem better. > > -jason > On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote: > >> Hello all, >> >> script (parsing blastn report) that previously had worked today >> "tell" me >> that: >> >> ------------- EXCEPTION ------------- >> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005] >> : No such file or directory >> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: >> 273 >> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213 >> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135 >> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167 >> STACK toplevel parse-te-lib2.pl:3 >> >> -------------------------------------- >> >> What does it mean?? >> >> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > -- ?????????? M2, ????????????? ???????? ?????????? Opera: http://www.opera.com/mail/mail/ From bix at sendu.me.uk Mon Apr 16 04:34:56 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 16 Apr 2007 09:34:56 +0100 Subject: [Bioperl-l] error with blast parsing by searchIO In-Reply-To: References: Message-ID: <46233530.1010109@sendu.me.uk> sergei ryazansky wrote: > The problem was resolved by the direct path (-file=>'d\...\input.txt') to > input file in the my script. > I think that Chris right and i should update my bioperl to 1.5 version. > By the way, bioperl-1.5 is not accessible via ppm. Where I can download it > for winXP? http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows From ewijaya at i2r.a-star.edu.sg Mon Apr 16 10:36:33 2007 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Mon, 16 Apr 2007 22:36:33 +0800 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> Dear all, Given a GO id, is there a way to extract all the related gene names from that id with Perl? Anybody has experience with that? I've looked through GO module in CPAN, but can't seem to find any tool that facilitated that searc Look forward very much for your advice. -- Edward WIJAYA SINGAPORE ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From spiros at lokku.com Mon Apr 16 11:10:49 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Mon, 16 Apr 2007 16:10:49 +0100 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> Message-ID: Hi Edward, What organism are you interested in? I have some code from my PhD based on the Saccharomyces cerevisiae genome. Basically uses the SGD flat files and a local MySQL instance of GO. Might be worth turning into modules if people are interested in it, although it is pretty organism oriented and the lack of abstraction might introduce a number of problems. Spiros On 4/16/07, Wijaya Edward wrote: > > Dear all, > > Given a GO id, is there a way to extract all > the related gene names from that id with Perl? > > Anybody has experience with that? > I've looked through GO module in CPAN, but can't seem > to find any tool that facilitated that searc > > Look forward very much for your advice. > > -- > Edward WIJAYA > SINGAPORE > > ------------ Institute For Infocomm Research - Disclaimer ------------- > This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. > -------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From ewijaya at i2r.a-star.edu.sg Mon Apr 16 11:14:09 2007 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Mon, 16 Apr 2007 23:14:09 +0800 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net> Hi Spiros, Thanks for your reply. I am interested to apply it for all the kind of organisms related to that particular GO ID. Do you have a CPAN module for that? -- Edward WIJAYA SINGAPORE ________________________________ From: s.denaxas at gmail.com on behalf of Spiros Denaxas Sent: Mon 4/16/2007 11:10 PM To: Wijaya Edward Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl Hi Edward, What organism are you interested in? I have some code from my PhD based on the Saccharomyces cerevisiae genome. Basically uses the SGD flat files and a local MySQL instance of GO. Might be worth turning into modules if people are interested in it, although it is pretty organism oriented and the lack of abstraction might introduce a number of problems. Spiros On 4/16/07, Wijaya Edward wrote: > > Dear all, > > Given a GO id, is there a way to extract all > the related gene names from that id with Perl? > > Anybody has experience with that? > I've looked through GO module in CPAN, but can't seem > to find any tool that facilitated that searc > > Look forward very much for your advice. > > -- > Edward WIJAYA > SINGAPORE > > ------------ Institute For Infocomm Research - Disclaimer ------------- > This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. > -------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From dmessina at wustl.edu Mon Apr 16 11:21:01 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 16 Apr 2007 10:21:01 -0500 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> Message-ID: I use BioMART for this kind of thing. If you need to do this for more than a couple of GO terms, BioMART has a Perl API you can use to connect to their data. http://www.biomart.org/ http://www.biomart.org/install-overview.html Dave From spiros at lokku.com Mon Apr 16 11:21:40 2007 From: spiros at lokku.com (Spiros Denaxas) Date: Mon, 16 Apr 2007 16:21:40 +0100 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net> Message-ID: Nope, I don't have a CPAN module for it, and to be honest, I don't think I will release one for it until I actually finish my PhD. The code is really scruffy at some parts, lacks documentation and might not work under all setups. My plan is to take some time after and clean it up and release a proper version of it to the public. What you are talking about however, if I understand correctly, is a much much bigger project. Different genome databases have different formats and a potential module must take them all into consideration. Then the issue of the different evidence codes GO annotators use throughout different genomes and which you consider of higher or lower quality respective. If you happen to stumble upon such a module, please share it, it would be very interesting ! spiros On 4/16/07, Wijaya Edward wrote: > > Hi Spiros, > > Thanks for your reply. I am interested to apply it for > all the kind of organisms related to that particular GO ID. > > Do you have a CPAN module for that? > -- > Edward WIJAYA > SINGAPORE > > ________________________________ > > From: s.denaxas at gmail.com on behalf of Spiros Denaxas > Sent: Mon 4/16/2007 11:10 PM > To: Wijaya Edward > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl > > > > Hi Edward, > > What organism are you interested in? I have some code from my PhD > based on the Saccharomyces cerevisiae genome. Basically uses the SGD > flat files and a local MySQL instance of GO. Might be worth turning > into modules if people are interested in it, although it is pretty > organism oriented and the lack of abstraction might introduce a number > of problems. > > Spiros > > On 4/16/07, Wijaya Edward wrote: > > > > Dear all, > > > > Given a GO id, is there a way to extract all > > the related gene names from that id with Perl? > > > > Anybody has experience with that? > > I've looked through GO module in CPAN, but can't seem > > to find any tool that facilitated that searc > > > > Look forward very much for your advice. > > > > -- > > Edward WIJAYA > > SINGAPORE > > > > ------------ Institute For Infocomm Research - Disclaimer ------------- > > This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. > > -------------------------------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > ------------ Institute For Infocomm Research - Disclaimer ------------- > This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. > -------------------------------------------------------- > From ewijaya at i2r.a-star.edu.sg Mon Apr 16 11:33:27 2007 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Mon, 16 Apr 2007 23:33:27 +0800 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net> Hi David, There seems to be no biomart-perl module in CPAN. I tried their cvs: cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login But require password. Can suggest if there is another way to get this module? -- Edward WIJAYA ________________________________ From: David Messina [mailto:dmessina at wustl.edu] Sent: Mon 4/16/2007 11:21 PM To: Wijaya Edward Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl I use BioMART for this kind of thing. If you need to do this for more than a couple of GO terms, BioMART has a Perl API you can use to connect to their data. http://www.biomart.org/ http://www.biomart.org/install-overview.html Dave ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From Kevin.M.Brown at asu.edu Mon Apr 16 11:44:28 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 16 Apr 2007 08:44:28 -0700 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net> Message-ID: <1A4207F8295607498283FE9E93B775B4030A4914@EX02.asurite.ad.asu.edu> Did you follow the directions as listed at? http://www.biomart.org/install-overview.html > There seems to be no biomart-perl module in CPAN. > > I tried their cvs: > cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login > > But require password. Can suggest if there is another way to > get this module? From dmessina at wustl.edu Mon Apr 16 11:44:26 2007 From: dmessina at wustl.edu (David Messina) Date: Mon, 16 Apr 2007 10:44:26 -0500 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net> Message-ID: <2D698B2E-49B9-411E-B1FA-C12F4A235EB2@wustl.edu> The password you need to enter when asked is CVSUSER. Dave From sdavis2 at mail.nih.gov Mon Apr 16 11:55:14 2007 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Mon, 16 Apr 2007 11:55:14 -0400 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net> Message-ID: <200704161155.14567.sdavis2@mail.nih.gov> > > On 4/16/07, Wijaya Edward wrote: > > > Dear all, > > > > > > Given a GO id, is there a way to extract all > > > the related gene names from that id with Perl? This is a pretty simple problem if you have the data in a useable format. The data that you need are available here: ftp://ftp.ncbi.nih.gov/gene/DATA The README file gives details, but the files in this directory are all tab-delimited text. Download the gene2go.gz file, which contains a mapping from Entrez Gene ID to GO accession. Then, download the gene_info.gz file, which contains the information about the Entrez Gene ID, including description, gene symbol, etc. If you need to link to other data, you can of course download the respective files from NCBI. You can either load the data into a SQL database of some type for general queries, or you can simply read them into perl directly (with appropriate data structures) to do you mapping. Since they are tab-delimited text, I would choose the database route and then use SQL and DBI to do the queries you like. Sean From cjfields at uiuc.edu Mon Apr 16 12:25:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 16 Apr 2007 11:25:42 -0500 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> Message-ID: You can limit EntrezGene searches by Gene Ontology ID using the [Gene Ontology] field in queries. The following query: '9220[Gene Ontology]' will give 120 gene IDs. You can get the same list using the still- under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm still working on this): my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'gene', -term => '9220[Gene Ontology]', -retmax => 300); $esearch->get_response; my @ids = $esearch->get_ids; print join "\n", at ids; In my opinion, Sean's idea of using SQL is probably better if you have tons of searches to do. chris On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote: > > Dear all, > > Given a GO id, is there a way to extract all > the related gene names from that id with Perl? > > Anybody has experience with that? > I've looked through GO module in CPAN, but can't seem > to find any tool that facilitated that searc > > Look forward very much for your advice. > > -- > Edward WIJAYA > SINGAPORE > > ------------ Institute For Infocomm Research - Disclaimer > ------------- > This email is confidential and may be privileged. If you are not > the intended recipient, please delete it and notify us immediately. > Please do not copy or use it for any purpose, or disclose its > contents to any other person. Thank you. > -------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Apr 16 14:34:25 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 16 Apr 2007 13:34:25 -0500 Subject: [Bioperl-l] Bio::Matrix::PSM::ProtPsm Message-ID: I was going through tests converting to Test::More and found this module is largely unimplemented (relevant tests are in t/ProtPsm.t in CVS). It was written by James Thompson a few years ago and the module docs seem to indicate some uncertainty on what this class is meant to accomplish. Does anyone know the status of this code? chris From cjm at fruitfly.org Mon Apr 16 14:49:23 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 16 Apr 2007 11:49:23 -0700 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net> References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net> Message-ID: Download: http://search.cpan.org/~cmungall/go-db-perl or do: cpan GO::AppHandle The API call you want is here: http://search.cpan.org/~cmungall/go-db-perl/GO/ AppHandle.pm#get_deep_products Here is an example snippet: use GO::AppHandle; my $apph=GO::AppHandle->connect(@ARGV); my $go_acc = shift @ARGV; my $gps = $apph->get_deep_products({term=>{acc=>$go_acc}}); foreach my $gp (@$gps) { printf "%s %s\n", $gp->xref->acc, $gp->symbol; } You will need to download the GO Database. Cheers Chris On Apr 16, 2007, at 8:14 AM, Wijaya Edward wrote: > > Hi Spiros, > > Thanks for your reply. I am interested to apply it for > all the kind of organisms related to that particular GO ID. > > Do you have a CPAN module for that? > -- > Edward WIJAYA > SINGAPORE > > ________________________________ > > From: s.denaxas at gmail.com on behalf of Spiros Denaxas > Sent: Mon 4/16/2007 11:10 PM > To: Wijaya Edward > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) > with Perl > > > > Hi Edward, > > What organism are you interested in? I have some code from my PhD > based on the Saccharomyces cerevisiae genome. Basically uses the SGD > flat files and a local MySQL instance of GO. Might be worth turning > into modules if people are interested in it, although it is pretty > organism oriented and the lack of abstraction might introduce a number > of problems. > > Spiros > > On 4/16/07, Wijaya Edward wrote: >> >> Dear all, >> >> Given a GO id, is there a way to extract all >> the related gene names from that id with Perl? >> >> Anybody has experience with that? >> I've looked through GO module in CPAN, but can't seem >> to find any tool that facilitated that searc >> >> Look forward very much for your advice. >> >> -- >> Edward WIJAYA >> SINGAPORE >> >> ------------ Institute For Infocomm Research - Disclaimer >> ------------- >> This email is confidential and may be privileged. If you are not >> the intended recipient, please delete it and notify us >> immediately. Please do not copy or use it for any purpose, or >> disclose its contents to any other person. Thank you. >> -------------------------------------------------------- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > ------------ Institute For Infocomm Research - Disclaimer > ------------- > This email is confidential and may be privileged. If you are not > the intended recipient, please delete it and notify us immediately. > Please do not copy or use it for any purpose, or disclose its > contents to any other person. Thank you. > -------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From gdorjee at hotmail.com Mon Apr 16 15:10:01 2007 From: gdorjee at hotmail.com (DeeGee) Date: Mon, 16 Apr 2007 12:10:01 -0700 (PDT) Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu> References: <9997343.post@talk.nabble.com> <10008507.post@talk.nabble.com> <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu> <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu> Message-ID: <10022463.post@talk.nabble.com> hi Chris, thanks for your reply. i set the RemoteBlast factory to a verbosity of 1, and i get the same error message. i'm new to all these. so, could you plz tell me how can i do the RemoteBlast in CVS that you've suggested. cheers!!! Chris Fields wrote: > > This sounds like a similar issue that popped up a few weeks ago > related to URLAPI changes for remote BLAST access. That was fixed on > NCBI's end but I also added a fix to RemoteBlast in CVS that works as > well. > > Saying that, my guess is the same as Dave's, that there are > connectivity issues. What happens when you set the RemoteBlast > factory to a verbosity of 1? This will spill out debugging output > from the repeated queries to the NCBI server (so if there are > problems they'll show up there). > > ... > my $factory = Bio::Tools::Run::RemoteBlast->new( > '-prog' => 'blastp', > '-data' => 'swissprot', > _READMETHOD => "Blast", > -verbose => 1 # debugging output > ); > ... > > If you see the BLAST report but get the same error try using the > RemoteBlast in CVS to see if it fixes the problem. > > chris > > > On Apr 15, 2007, at 9:43 PM, David Messina wrote: > >> You're right, it's not the input sequence. I just tried it with your >> script and it worked. >> >> >>> is it possible that the script is not being about to read the >>> RemoteBlast.pm? >> >> I think the program wouldn't compile if that were the case, and your >> error message would be about not finding RemoteBlast.pm rather than >> the one you got. >> >> >>> but the thing is, i can run the standalone blast on the >>> command line, although i've never been able the run the same with >>> cgi module >>> (by gettting the input from an html textarea). i don't understand. >> >> This result really suggests that perl and Bioperl are not the issue. >> I'm not saying the following to give you the brushoff, but given the >> numerous ways in which web-based apps can fail and in which >> webservers can be installed, it might be best for you to find someone >> at your institution who can sit down with you and work through it. >> >> Dave >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022463 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From gdorjee at hotmail.com Mon Apr 16 15:11:18 2007 From: gdorjee at hotmail.com (DeeGee) Date: Mon, 16 Apr 2007 12:11:18 -0700 (PDT) Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu> References: <9997343.post@talk.nabble.com> <10008507.post@talk.nabble.com> <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu> Message-ID: <10022464.post@talk.nabble.com> Thank you, David. David Messina-2 wrote: > > You're right, it's not the input sequence. I just tried it with your > script and it worked. > > >> is it possible that the script is not being about to read the >> RemoteBlast.pm? > > I think the program wouldn't compile if that were the case, and your > error message would be about not finding RemoteBlast.pm rather than > the one you got. > > >> but the thing is, i can run the standalone blast on the >> command line, although i've never been able the run the same with >> cgi module >> (by gettting the input from an html textarea). i don't understand. > > This result really suggests that perl and Bioperl are not the issue. > I'm not saying the following to give you the brushoff, but given the > numerous ways in which web-based apps can fail and in which > webservers can be installed, it might be best for you to find someone > at your institution who can sit down with you and work through it. > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022464 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjm at fruitfly.org Mon Apr 16 14:41:59 2007 From: cjm at fruitfly.org (Chris Mungall) Date: Mon, 16 Apr 2007 11:41:59 -0700 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> Message-ID: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org> Unless the Entrez interface has changed since I last looked, the query below for "pyrimidine ribonucleotide biosynthetic process" will NOT perform the transitive closure over the graph; this means genes and gene products annotated to GO:0009174 "pyrimidine ribonucleoside monophosphate biosynthetic process", for example On Apr 16, 2007, at 9:25 AM, Chris Fields wrote: > You can limit EntrezGene searches by Gene Ontology ID using the [Gene > Ontology] field in queries. The following query: > > '9220[Gene Ontology]' > > will give 120 gene IDs. You can get the same list using the still- > under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm > still working on this): > > my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'gene', > -term => '9220[Gene > Ontology]', > -retmax => 300); > $esearch->get_response; > my @ids = $esearch->get_ids; > print join "\n", at ids; > > In my opinion, Sean's idea of using SQL is probably better if you > have tons of searches to do. > > chris > > On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote: > >> >> Dear all, >> >> Given a GO id, is there a way to extract all >> the related gene names from that id with Perl? >> >> Anybody has experience with that? >> I've looked through GO module in CPAN, but can't seem >> to find any tool that facilitated that searc >> >> Look forward very much for your advice. >> >> -- >> Edward WIJAYA >> SINGAPORE >> >> ------------ Institute For Infocomm Research - Disclaimer >> ------------- >> This email is confidential and may be privileged. If you are not >> the intended recipient, please delete it and notify us immediately. >> Please do not copy or use it for any purpose, or disclose its >> contents to any other person. Thank you. >> -------------------------------------------------------- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Mon Apr 16 15:25:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 16 Apr 2007 14:25:14 -0500 Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl In-Reply-To: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org> References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net> <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org> Message-ID: <3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu> You are correct; it explains why the list is only 120 genes. The only way (currently) to do so would be to perform the closure locally somehow (maybe via go-perl or similar). chris On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote: > > Unless the Entrez interface has changed since I last looked, the > query below for "pyrimidine ribonucleotide biosynthetic process" > will NOT perform the transitive closure over the graph; this means > genes and gene products annotated to GO:0009174 "pyrimidine > ribonucleoside monophosphate biosynthetic process", for example > > On Apr 16, 2007, at 9:25 AM, Chris Fields wrote: > >> You can limit EntrezGene searches by Gene Ontology ID using the [Gene >> Ontology] field in queries. The following query: >> >> '9220[Gene Ontology]' >> >> will give 120 gene IDs. You can get the same list using the still- >> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm >> still working on this): >> >> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch', >> -db => 'gene', >> -term => '9220[Gene >> Ontology]', >> -retmax => 300); >> $esearch->get_response; >> my @ids = $esearch->get_ids; >> print join "\n", at ids; >> >> In my opinion, Sean's idea of using SQL is probably better if you >> have tons of searches to do. >> >> chris >> >> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote: >> >>> >>> Dear all, >>> >>> Given a GO id, is there a way to extract all >>> the related gene names from that id with Perl? >>> >>> Anybody has experience with that? >>> I've looked through GO module in CPAN, but can't seem >>> to find any tool that facilitated that searc >>> >>> Look forward very much for your advice. >>> >>> -- >>> Edward WIJAYA >>> SINGAPORE >>> >>> ------------ Institute For Infocomm Research - Disclaimer >>> ------------- >>> This email is confidential and may be privileged. If you are not >>> the intended recipient, please delete it and notify us immediately. >>> Please do not copy or use it for any purpose, or disclose its >>> contents to any other person. Thank you. >>> -------------------------------------------------------- >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From gdorjee at hotmail.com Mon Apr 16 15:27:32 2007 From: gdorjee at hotmail.com (DeeGee) Date: Mon, 16 Apr 2007 12:27:32 -0700 (PDT) Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu> References: <9997343.post@talk.nabble.com> <10008507.post@talk.nabble.com> <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu> <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu> Message-ID: <10022661.post@talk.nabble.com> hi Chris, sorry to bother you again, but could you plz check the following script to see what's wrong. i've been getting errors like : Content-type: text/html Software error: ------------- EXCEPTION ------------- MSG: (0) not Bio::Seq object or array of Bio::Seq objects or file name! STACK Bio::Tools::Run::StandAloneBlast::blastall /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:532 STACK toplevel /usr/local/apache2/htdocs/rmtest.pl:46 -------------------------------------- #### the script ###### #!/usr/bin/perl -w use strict; use warnings; use Bio::SeqIO; use Bio::SearchIO; use Bio::DB::GenPept; use Bio::Tools::Run::StandAloneBlast; use CGI; use CGI::Carp qw(fatalsToBrowser); my $cgi = new CGI; print $cgi->header, $cgi->start_html(-title=>'A StandAloneBlast Test'), $cgi->h1('Blast Result'), $cgi->start_form, "Enter or paste an amino-acid sequence? ", $cgi->p, $cgi->textarea(-name=>'name', rows=>10, -columns=>60), $cgi->p, $cgi->submit, $cgi->end_form, $cgi->hr; open(OUTPUT,">result/query.faa"); if ($cgi->param()) { my $seq = $cgi->param('name'); print OUTPUT $seq; my @params = ('program'=>'blastp', 'database' => '/export/home/dorjee/database/nrpart', 'outfile' => 'result/blast.out', _READMETHOD => 'Blast'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); # Blast a sequence against a database: my $str = Bio::SeqIO->new(-file => "result/query.faa", '-format' => 'Fasta' ); my $input = $str->next_seq(); my $blast_report = $factory->blastall($input); } Chris Fields wrote: > > This sounds like a similar issue that popped up a few weeks ago > related to URLAPI changes for remote BLAST access. That was fixed on > NCBI's end but I also added a fix to RemoteBlast in CVS that works as > well. > > Saying that, my guess is the same as Dave's, that there are > connectivity issues. What happens when you set the RemoteBlast > factory to a verbosity of 1? This will spill out debugging output > from the repeated queries to the NCBI server (so if there are > problems they'll show up there). > > ... > my $factory = Bio::Tools::Run::RemoteBlast->new( > '-prog' => 'blastp', > '-data' => 'swissprot', > _READMETHOD => "Blast", > -verbose => 1 # debugging output > ); > ... > > If you see the BLAST report but get the same error try using the > RemoteBlast in CVS to see if it fixes the problem. > > chris > > > On Apr 15, 2007, at 9:43 PM, David Messina wrote: > >> You're right, it's not the input sequence. I just tried it with your >> script and it worked. >> >> >>> is it possible that the script is not being about to read the >>> RemoteBlast.pm? >> >> I think the program wouldn't compile if that were the case, and your >> error message would be about not finding RemoteBlast.pm rather than >> the one you got. >> >> >>> but the thing is, i can run the standalone blast on the >>> command line, although i've never been able the run the same with >>> cgi module >>> (by gettting the input from an html textarea). i don't understand. >> >> This result really suggests that perl and Bioperl are not the issue. >> I'm not saying the following to give you the brushoff, but given the >> numerous ways in which web-based apps can fail and in which >> webservers can be installed, it might be best for you to find someone >> at your institution who can sit down with you and work through it. >> >> Dave >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022661 Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at uiuc.edu Mon Apr 16 15:37:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 16 Apr 2007 14:37:58 -0500 Subject: [Bioperl-l] error while remote blast against swissprot db In-Reply-To: <10022463.post@talk.nabble.com> References: <9997343.post@talk.nabble.com> <10008507.post@talk.nabble.com> <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu> <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu> <10022463.post@talk.nabble.com> Message-ID: <5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu> The 'verbose' setting doesn't change the way the BLAST query is sent, it just sends the raw output from the repeated attempts to retrieve the report (using the RID) to STDERR. The error you saw won't be fixed by doing so. What I was interested in was the raw HTML output dumped to the screen. If it is querying the NCBI server it should dump stuff that includes something like this: ...