From Laurence.Amilhat at toulouse.inra.fr Thu Jan 3 09:29:09 2008 From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat) Date: Thu, 03 Jan 2008 15:29:09 +0100 Subject: [Bioperl-l] BioPerl and NHX tree Message-ID: <477CF135.9060104@toulouse.inra.fr> Dear all, I am trying to convert a newick tree into an NHX tree, so I can add the taxid tag for each leaf. I am using the modules: Bio::TreeIO & Bio::Tree::NodeNHX The idea is 1) to read the newick tree 2) get the leaf, and get the corresponding taxid for it 3) add the nhx species tag 4) write the nhx tree I was able to do the first 2 steps, and I could create an object node_nhx and add the tag T, but I don't know how to write an nhx Tree with the node_nhx previously created... Does anyone have an idea? any help are welcome. Thanks, laurence. Here are my code and the samples files for better understanding: newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt _newick2nhx.pl:_ use strict; use Bio::TreeIO; use Bio::Tree::NodeNHX; use Getopt::Long; my $tree_file; my $outfile; my $codefile; my %corresp; GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' =>\$codefile); open (CODE, "< $codefile"); while () { chomp; my($a, $b)=split (/\t/); $corresp{$a}=$b; } my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file"); my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile"); while (my $tree= $treeio->next_tree) { my @nodes=$tree->get_nodes(); foreach my $nd(@nodes) { if ($nd->is_Leaf()) { my $id=$nd->id(); print "$id TAXID ",$corresp{$id},"\n"; my $nodenhx=new Bio::Tree::NodeNHX(); $nodenhx->nhx_tag({T=>$corresp{$id}}); } } $treeout->write_tree($tree); } _test_tree.nwk_: (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0, 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0, (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0,AAEL015662:100.0):100.0, 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0, 42558941:100.0); _seq_taxid.txt:_ AAEL015662 7159 42558969 9606 42558981 10090 42558942 9606 42558970 6239 42558929 10116 42558987 9606 42558930 10116 42558943 9606 148887393 10090 42558958 10090 42558941 9606 56405380 10090 90185247 9606 66774197 6239 _And the tata resulting file:_ (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0,(42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],((((( 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0,148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100.0[&&NHX],AAEL01566 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX],(42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0); -- ==================================================================== = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = ==================================================================== From aaron.j.mackey at gsk.com Thu Jan 3 10:12:22 2008 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Thu, 3 Jan 2008 10:12:22 -0500 Subject: [Bioperl-l] BioPerl and NHX tree In-Reply-To: <477CF135.9060104@toulouse.inra.fr> Message-ID: Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that way, your tree's nodes are already NodeNHX's. Instead of creating a new $nodenhx, you can use the $node variable directly from the tree ... -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM: > Dear all, > > I am trying to convert a newick tree into an NHX tree, so I can add the > taxid tag for each leaf. > > I am using the modules: Bio::TreeIO & Bio::Tree::NodeNHX > The idea is > 1) to read the newick tree > 2) get the leaf, and get the corresponding taxid for it > 3) add the nhx species tag > 4) write the nhx tree > > I was able to do the first 2 steps, and I could create an object > node_nhx and add the tag T, > but I don't know how to write an nhx Tree with the node_nhx previously > created... > > Does anyone have an idea? any help are welcome. > > Thanks, > > laurence. > > > Here are my code and the samples files for better understanding: > newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt > > _newick2nhx.pl:_ > use strict; > use Bio::TreeIO; > use Bio::Tree::NodeNHX; > use Getopt::Long; > > > my $tree_file; > my $outfile; > my $codefile; > my %corresp; > > GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' > =>\$codefile); > > open (CODE, "< $codefile"); > while () > { > chomp; > my($a, $b)=split (/\t/); > $corresp{$a}=$b; > } > > > my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file"); > my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile"); > > while (my $tree= $treeio->next_tree) > { > my @nodes=$tree->get_nodes(); > foreach my $nd(@nodes) > { > if ($nd->is_Leaf()) > { > my $id=$nd->id(); > print "$id TAXID ",$corresp{$id},"\n"; > > my $nodenhx=new Bio::Tree::NodeNHX(); > $nodenhx->nhx_tag({T=>$corresp{$id}}); > } > } > $treeout->write_tree($tree); > } > > > _test_tree.nwk_: > (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0, > 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0, > (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0, > AAEL015662:100.0):100.0, > 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0, > 42558941:100.0); > > _seq_taxid.txt:_ > AAEL015662 7159 > 42558969 9606 > 42558981 10090 > 42558942 9606 > 42558970 6239 > 42558929 10116 > 42558987 9606 > 42558930 10116 > 42558943 9606 > 148887393 10090 > 42558958 10090 > 42558941 9606 > 56405380 10090 > 90185247 9606 > 66774197 6239 > > > _And the tata resulting file:_ > (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0, > (42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],((((( > 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0, > 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100. > 0[&&NHX],AAEL01566 > 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX], > (42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0); > > > > > -- > ==================================================================== > = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = > = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = > ==================================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Laurence.Amilhat at toulouse.inra.fr Fri Jan 4 03:33:22 2008 From: Laurence.Amilhat at toulouse.inra.fr (Laurence Amilhat) Date: Fri, 04 Jan 2008 09:33:22 +0100 Subject: [Bioperl-l] BioPerl and NHX tree In-Reply-To: References: Message-ID: <477DEF52.20802@toulouse.inra.fr> Thank you Aaron, it's working now. I've changed to species instead of taxid, so I can color the species on my tree using the ATV viewer. thanks again, Regards, Laurence. aaron.j.mackey at gsk.com a ?crit : > Instead of using TreeIO::newick to read the tree, use TreeIO::nhx -- that > way, your tree's nodes are already NodeNHX's. Instead of creating a new > $nodenhx, you can use the $node variable directly from the tree ... > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 01/03/2008 09:29:09 AM: > > >> Dear all, >> >> I am trying to convert a newick tree into an NHX tree, so I can add the >> taxid tag for each leaf. >> >> I am using the modules: Bio::TreeIO & Bio::Tree::NodeNHX >> The idea is >> 1) to read the newick tree >> 2) get the leaf, and get the corresponding taxid for it >> 3) add the nhx species tag >> 4) write the nhx tree >> >> I was able to do the first 2 steps, and I could create an object >> node_nhx and add the tag T, >> but I don't know how to write an nhx Tree with the node_nhx previously >> created... >> >> Does anyone have an idea? any help are welcome. >> >> Thanks, >> >> laurence. >> >> >> Here are my code and the samples files for better understanding: >> newick2nhx.pl -f test_tree.nwk -o tata -c seq_taxid.txt >> >> _newick2nhx.pl:_ >> use strict; >> use Bio::TreeIO; >> use Bio::Tree::NodeNHX; >> use Getopt::Long; >> >> >> my $tree_file; >> my $outfile; >> my $codefile; >> my %corresp; >> >> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile, 'c|code:s' >> =>\$codefile); >> >> open (CODE, "< $codefile"); >> while () >> { >> chomp; >> my($a, $b)=split (/\t/); >> $corresp{$a}=$b; >> } >> >> >> my $treeio = new Bio::TreeIO (-format => 'newick', -file => >> > "$tree_file"); > >> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile"); >> >> while (my $tree= $treeio->next_tree) >> { >> my @nodes=$tree->get_nodes(); >> foreach my $nd(@nodes) >> { >> if ($nd->is_Leaf()) >> { >> my $id=$nd->id(); >> print "$id TAXID ",$corresp{$id},"\n"; >> >> my $nodenhx=new Bio::Tree::NodeNHX(); >> $nodenhx->nhx_tag({T=>$corresp{$id}}); >> } >> } >> $treeout->write_tree($tree); >> } >> >> >> _test_tree.nwk_: >> >> > (((((42558930:100.0,42558943:100.0):100.0,(42558969:100.0,(42558981:100.0, > > 42558942:100.0):100.0):72.0):81.0,(((((90185247:100.0,56405380:100.0):100.0, > >> (42558987:100.0,148887393:100.0):100.0):90.0,66774197:100.0):100.0, >> AAEL015662:100.0):100.0, >> 42558970:100.0):82.0):100.0,(42558929:100.0,42558958:100.0):79.0):100.0, >> 42558941:100.0); >> >> _seq_taxid.txt:_ >> AAEL015662 7159 >> 42558969 9606 >> 42558981 10090 >> 42558942 9606 >> 42558970 6239 >> 42558929 10116 >> 42558987 9606 >> 42558930 10116 >> 42558943 9606 >> 148887393 10090 >> 42558958 10090 >> 42558941 9606 >> 56405380 10090 >> 90185247 9606 >> 66774197 6239 >> >> >> _And the tata resulting file:_ >> (((((42558930:100.0,42558943:100.0):100.0[&&NHX],(42558969:100.0, >> >> > (42558981:100.0,42558942:100.0):100.0[&&NHX]):72.0[&&NHX]):81.0[&&NHX],((((( > >> 90185247:100.0,56405380:100.0):100.0[&&NHX],(42558987:100.0, >> 148887393:100.0):100.0[&&NHX]):90.0[&&NHX],66774197:100.0):100. >> 0[&&NHX],AAEL01566 >> 2:100.0):100.0[&&NHX],42558970:100.0):82.0[&&NHX]):100.0[&&NHX], >> >> > (42558929:100.0,42558958:100.0):79.0[&&NHX]):100.0[&&NHX],42558941:100.0); > >> >> >> -- >> ==================================================================== >> = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = >> = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = >> ==================================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- ==================================================================== = Laurence Amilhat INRA Toulouse 31326 Castanet-Tolosan = = Tel: 33 5 61 28 53 34 Email: laurence.amilhat at toulouse.inra.fr = ==================================================================== From hlapp at gmx.net Sun Jan 6 22:02:32 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 6 Jan 2008 22:02:32 -0500 Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db? In-Reply-To: References: Message-ID: <640890C9-2D34-4C70-9179-26A9EAB397D2@gmx.net> Hi Zhihua, you didn't ever respond to Marc's link to the Persistent Bioperl slides - did that help? -hilmar On Dec 6, 2007, at 11:25 PM, zhihuali wrote: > > Hi netters, > > I've installed BioSQL and bioperl-db, and successfully created and > stored a persistent object: > > use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB; > my $dbadp=Bio::DB::BioDB->new(- > database=>'biosql', - > user=>'annoymous', -dbname=>'bioseqdb'); > > my $seqobj=Bio::Seq->new(- > accession_number=>"test", - > id=>"test1", - > seq=>"AGCTAGCT", -version=>1);my $dbobj=$dbadp- > >create_persistent($seqobj);$dbobj->create;$dbobj->commit; > > It's successful because I found corresponding rows in the bioseqdb > tables. > > Now I want to retrieve the object back from the database. There's > not much documents available and I've tried find_by_unique_key/ > primary_key but all failed. Maybe I didn't use them correctly. > Could anyone give me an example as how to retrieve the stored > Bio::Seq object? > > Thanks a lot! > > Zhihua Li > _________________________________________________________________ > ? Live Search ??????? > http://www.live.com/?searchOnly=true > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cain.cshl at gmail.com Mon Jan 7 12:24:02 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 07 Jan 2008 12:24:02 -0500 Subject: [Bioperl-l] Anything up with cvs/svn? Message-ID: <1199726642.6374.10.camel@frissell> Hello, I was trying to get bioperl-live this morning from either cvs or svn and failed. I was wondering if something was going on with the server. Here are the things I tried: cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co bioperl-live which resulted in this: cvs checkout: warning: cannot write to history file /home/repository/bioperl/CVSROOT/history: Permission denied cvs checkout: Updating bioperl-live cvs checkout: failed to create lock directory for `/home/repository/bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/#cvs.lock): Permission denied cvs checkout: failed to obtain dir lock in repository `/home/repository/bioperl/bioperl-live' cvs [checkout aborted]: read lock failed - giving up Then I thought I'd try the suggested svn checkout method from the bioperl wiki: svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live which resulted in svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/bioperl-live' Finally, I after looking at the openbio server, I thought I'd try this: svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/bioperl/bioperl-live which resulted in repeated requests for my password (which I supplied correctly at least once out of the several requests). So, what's up? Thanks much, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hlapp at gmx.net Mon Jan 7 12:36:02 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 7 Jan 2008 12:36:02 -0500 Subject: [Bioperl-l] Anything up with cvs/svn? In-Reply-To: <1199726642.6374.10.camel@frissell> References: <1199726642.6374.10.camel@frissell> Message-ID: I think we are still migrating to svn. It's probably better to wait for the announcement that everything is ready to go. (And then cvs won't work anymore except for anonymous checkout - which should actually continue to work while this is in progress. Have you tried that?) -hilmar On Jan 7, 2008, at 12:24 PM, Scott Cain wrote: > Hello, > > I was trying to get bioperl-live this morning from either cvs or > svn and > failed. I was wondering if something was going on with the server. > > Here are the things I tried: > > cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co > bioperl-live > > which resulted in this: > > cvs checkout: warning: cannot write to history file /home/ > repository/bioperl/CVSROOT/history: Permission denied > cvs checkout: Updating bioperl-live > cvs checkout: failed to create lock directory for `/home/repository/ > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ > #cvs.lock): Permission denied > cvs checkout: failed to obtain dir lock in repository `/home/ > repository/bioperl/bioperl-live' > cvs [checkout aborted]: read lock failed - giving up > > Then I thought I'd try the suggested svn checkout method from the > bioperl wiki: > > svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ > bioperl-live > > which resulted in > > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ > hartzell/bioperl/bioperl-live' > > Finally, I after looking at the openbio server, I thought I'd try > this: > > svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ > bioperl/bioperl-live > > which resulted in repeated requests for my password (which I supplied > correctly at least once out of the several requests). > > So, what's up? > > Thanks much, > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Mon Jan 7 12:43:18 2008 From: jason at bioperl.org (Jason Stajich) Date: Mon, 7 Jan 2008 09:43:18 -0800 Subject: [Bioperl-l] Anything up with cvs/svn? In-Reply-To: <1199726642.6374.10.camel@frissell> References: <1199726642.6374.10.camel@frissell> Message-ID: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org> CVS r/w is locked because we are transitioning to SVN - you can still checkout via anonymous CVS on code.open-bio.org. The SVN is going to be in /home/svn-repositories/bioperl not George's directory, but we are still monkeying around with the directory structure. You can try a checkout but be warned it may change a few more times if we add another directory layer in there. You will get requests for your password at least three times - I strongly suggest you use SSH keys to avoid getting prompted each time - I don't know why you get asked 3 times as it is a SVN thing I assume it is having to make 3 separate requests to do a checkout. That's what is up for now. We'll report when the final SVN migration is done. -jason On Jan 7, 2008, at 9:24 AM, Scott Cain wrote: > Hello, > > I was trying to get bioperl-live this morning from either cvs or > svn and > failed. I was wondering if something was going on with the server. > > Here are the things I tried: > > cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co > bioperl-live > > which resulted in this: > > cvs checkout: warning: cannot write to history file /home/ > repository/bioperl/CVSROOT/history: Permission denied > cvs checkout: Updating bioperl-live > cvs checkout: failed to create lock directory for `/home/repository/ > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ > #cvs.lock): Permission denied > cvs checkout: failed to obtain dir lock in repository `/home/ > repository/bioperl/bioperl-live' > cvs [checkout aborted]: read lock failed - giving up > > Then I thought I'd try the suggested svn checkout method from the > bioperl wiki: > > svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ > bioperl-live > > which resulted in > > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ > hartzell/bioperl/bioperl-live' > > Finally, I after looking at the openbio server, I thought I'd try > this: > > svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ > bioperl/bioperl-live > > which resulted in repeated requests for my password (which I supplied > correctly at least once out of the several requests). > > So, what's up? > > Thanks much, > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > ______________________________________________ From cain.cshl at gmail.com Mon Jan 7 12:57:38 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 07 Jan 2008 12:57:38 -0500 Subject: [Bioperl-l] Anything up with cvs/svn? In-Reply-To: <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org> References: <1199726642.6374.10.camel@frissell> <5B73F368-7368-4DA5-9867-EF15FCEE8B53@bioperl.org> Message-ID: <1199728658.6374.12.camel@frissell> Hi Hilmar and Jason, Thanks--for some reason, I thought svn was done. I'll remain anonymous for right now (Kind of difficult to do when you announce it publicly :-) Thanks, Scott On Mon, 2008-01-07 at 09:43 -0800, Jason Stajich wrote: > CVS r/w is locked because we are transitioning to SVN - you can still > checkout via anonymous CVS on code.open-bio.org. > > The SVN is going to be in /home/svn-repositories/bioperl not George's > directory, but we are still monkeying around with the directory > structure. You can try a checkout but be warned it may change a few > more times if we add another directory layer in there. > > You will get requests for your password at least three times - I > strongly suggest you use SSH keys to avoid getting prompted each time > - I don't know why you get asked 3 times as it is a SVN thing I > assume it is having to make 3 separate requests to do a checkout. > > That's what is up for now. We'll report when the final SVN migration > is done. > > -jason > On Jan 7, 2008, at 9:24 AM, Scott Cain wrote: > > > Hello, > > > > I was trying to get bioperl-live this morning from either cvs or > > svn and > > failed. I was wondering if something was going on with the server. > > > > Here are the things I tried: > > > > cvs -d:ext:scain at dev.open-bio.org:/home/repository/bioperl co > > bioperl-live > > > > which resulted in this: > > > > cvs checkout: warning: cannot write to history file /home/ > > repository/bioperl/CVSROOT/history: Permission denied > > cvs checkout: Updating bioperl-live > > cvs checkout: failed to create lock directory for `/home/repository/ > > bioperl/bioperl-live' (/home/repository/bioperl/bioperl-live/ > > #cvs.lock): Permission denied > > cvs checkout: failed to obtain dir lock in repository `/home/ > > repository/bioperl/bioperl-live' > > cvs [checkout aborted]: read lock failed - giving up > > > > Then I thought I'd try the suggested svn checkout method from the > > bioperl wiki: > > > > svn co svn+ssh://scain at dev.open-bio.org/home/hartzell/bioperl/ > > bioperl-live > > > > which resulted in > > > > svn: No repository found in 'svn+ssh://scain at dev.open-bio.org/home/ > > hartzell/bioperl/bioperl-live' > > > > Finally, I after looking at the openbio server, I thought I'd try > > this: > > > > svn co svn+ssh://scain at dev.open-bio.org/home/svn-repositories/ > > bioperl/bioperl-live > > > > which resulted in repeated requests for my password (which I supplied > > correctly at least once out of the several requests). > > > > So, what's up? > > > > Thanks much, > > Scott > > > > -- > > ---------------------------------------------------------------------- > > -- > > Scott Cain, Ph. D. > > cain.cshl at gmail.com > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > > > > > ______________________________________________ > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cain.cshl at gmail.com Mon Jan 7 13:34:25 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 07 Jan 2008 13:34:25 -0500 Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL` Message-ID: <1199730865.6374.18.camel@frissell> Hello, I was wanting to implement this myself (and probably still will, assuming it's not already there...) but I am not a Module::Build guru. Here's what I'd like to do: add a parameter that I can add when evoking perl Build.PL so that the default answers will be used when it would normally ask me a question while running perl Build.PL, something like this: perl Build.PL --yes Is this sort of thing already built into Module::Build and I can't see it? Or can somebody suggest the best way of going about this? Thanks much, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From cjfields at uiuc.edu Mon Jan 7 17:22:35 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 7 Jan 2008 16:22:35 -0600 Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL` In-Reply-To: <1199730865.6374.18.camel@frissell> References: <1199730865.6374.18.camel@frissell> Message-ID: <31AD254B-DABA-488D-BDA8-D690F949CC39@uiuc.edu> I agree it would be nice. Not sure how hard it would be to implement; maybe it would be best to have a mode of installation, say if one wanted 'minimal' (no optional module installation, no scripts), 'full', 'dev', (assume minimal install but don't test), and so on, falling back to the query-based approach if nothing is indicated. chris On Jan 7, 2008, at 12:34 PM, Scott Cain wrote: > Hello, > > I was wanting to implement this myself (and probably still will, > assuming it's not already there...) but I am not a Module::Build guru. > Here's what I'd like to do: add a parameter that I can add when > evoking > perl Build.PL so that the default answers will be used when it would > normally ask me a question while running perl Build.PL, something like > this: > > perl Build.PL --yes > > Is this sort of thing already built into Module::Build and I can't see > it? Or can somebody suggest the best way of going about this? > > Thanks much, > Scott > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Mon Jan 7 17:37:36 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 07 Jan 2008 22:37:36 +0000 Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL` In-Reply-To: <1199730865.6374.18.camel@frissell> References: <1199730865.6374.18.camel@frissell> Message-ID: <4782A9B0.60203@sendu.me.uk> Scott Cain wrote: > Hello, > > I was wanting to implement this myself (and probably still will, > assuming it's not already there...) but I am not a Module::Build guru. > Here's what I'd like to do: add a parameter that I can add when evoking > perl Build.PL so that the default answers will be used when it would > normally ask me a question while running perl Build.PL, something like > this: > > perl Build.PL --yes > > Is this sort of thing already built into Module::Build and I can't see > it? Or can somebody suggest the best way of going about this? You should ask on the Module::Build mailing list. If it already exists I don't think it is obvious, however. If your question is BioPerl related, and you're looking for a fast way of installing BioPerl without the annoying questions, I'm sure I could hack something into ModuleBuildBioperl.pm From cain.cshl at gmail.com Mon Jan 7 22:04:19 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 07 Jan 2008 22:04:19 -0500 Subject: [Bioperl-l] Automatically accepting defaults for `perl Build.PL` In-Reply-To: <4782A9B0.60203@sendu.me.uk> References: <1199730865.6374.18.camel@frissell> <4782A9B0.60203@sendu.me.uk> Message-ID: <1199761459.6017.1.camel@frissell> Hi Sendu, I just hacked something up (I only needed to change a few lines--once I figured out where everything was). I like Chris' idea though; before I commit it back (Ha, no rush there), I'll flesh it out a little more to give more options. Scott On Mon, 2008-01-07 at 22:37 +0000, Sendu Bala wrote: > Scott Cain wrote: > > Hello, > > > > I was wanting to implement this myself (and probably still will, > > assuming it's not already there...) but I am not a Module::Build guru. > > Here's what I'd like to do: add a parameter that I can add when evoking > > perl Build.PL so that the default answers will be used when it would > > normally ask me a question while running perl Build.PL, something like > > this: > > > > perl Build.PL --yes > > > > Is this sort of thing already built into Module::Build and I can't see > > it? Or can somebody suggest the best way of going about this? > > You should ask on the Module::Build mailing list. If it already exists I > don't think it is obvious, however. > > If your question is BioPerl related, and you're looking for a fast way > of installing BioPerl without the annoying questions, I'm sure I could > hack something into ModuleBuildBioperl.pm -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From granjeau at tagc.univ-mrs.fr Wed Jan 9 03:30:17 2008 From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137) Date: Wed, 09 Jan 2008 09:30:17 +0100 Subject: [Bioperl-l] Parsing SwissProt annotation in comment Message-ID: <47848619.40109@tagc.univ-mrs.fr> Hello, I would like to retrieve the human reviewed annotation of SwissProt entries; these information are in the comment section of the sequence file. Here is an example: CC -!- FUNCTION: Actins are highly conserved proteins that are involved CC in various types of cell motility and are ubiquitously expressed CC in all eukaryotic cells. CC -!- SUBUNIT: Polymerization of globular actin (G-actin) leads to a CC structural filament (F-actin) in the form of a two-stranded helix. CC Each actin can bind to 4 others. Found in a complex with XPO6, CC Ran, ACTB and PFN1. Component of a complex composed at least of CC ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with XPO6. CC -!- INTERACTION: CC Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668; CC P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161; CC -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton. Is there a specific method to do such a job? Thanks much, Samuel -- Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 24 http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 http://icim.marseille.inserm.fr/proteomique From robfsouza at gmail.com Wed Jan 9 08:20:08 2008 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Wed, 9 Jan 2008 11:20:08 -0200 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs Message-ID: Hello All! Greetings for everybody and happy new year for those following an western calendary! I'm starting a new project to store and analyze distinct sets of sequence annotation data which are related in a way suitable for representation in a directed (e.g. transcript splicing) or undirected (e.g. gene product interaction) graph. Analysis will require frequent queries based on interval overlaps, feature neighbourhood, annotation and, most importantly, feature relationships and stored paths. At first, I thought of build an entire new database structure to store project specific data (e.g. alternative splicing or protein interaction), but as I have some experience with Lincon's Bio::DB::SeqFeature::Store, I'm now considering extending it for the purpose of storing graphs describing relationships among features. I'm aware that some other bioperl related databases, specifically BioSQL and Chado, do have components which might be suitable for storing all or some of these data but, since Lincon's feature storage and interval binning implementations in Bio::DB::SeqFeature::Store::mysql are both clean, simple and very fast, perhaps extending it in a seemingly modular way is desirable. A good extension to Lincon's database could include tables like feature_relationship and feature_path, for edges and transitive closures (just like in BioSQL) and feature_stored_path, for exclusion of biologically irrelevant paths in DAGs, like certain splicing isoforms. These tables could be used to store sequence assemblies or EST alignments efficiently, including scaffolds inferred by connecting contigs. Before starting, I would like to know if the BioSQL and Chado schemata do have accelerators for quering intervals among billions of features and feature relatioships (some examples using these databases would also help, if they that these databases are efficient for such tasks). If these or other databases are not as suitable as Bio::DB::SeqFeature for feature retrieval based on interval overlap and attributes, then again I might consider extending Bio::DB::seqFeature and contributing such extensions back to bioperl... Any thoughts? Best regards, Robson PS: sorry if anyone gets two copies of this post, but took me some time to realize my new e-mail wasn't subscribed to bioperl-l... From bix at sendu.me.uk Wed Jan 9 08:59:08 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 09 Jan 2008 13:59:08 +0000 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: References: Message-ID: <4784D32C.9070807@sendu.me.uk> Robson Francisco de Souza wrote: > Before starting, I would like to know if the BioSQL and Chado schemata > do have accelerators for quering intervals among billions of features > and feature relatioships (some examples using these databases would > also help, if they that these databases are efficient for such tasks). > If these or other databases are not as suitable as Bio::DB::SeqFeature > for feature retrieval based on interval overlap and attributes, I'm using Bio::DB::SeqFeature for that purpose, but just a warning: I found that with millions of features it made a db that was too large in terms of disc space and too slow in terms of query time. I had to hack out its storage of feature objects in the db, instead generating feature objects on request from the stored attributes. Doing this turned out to be faster than simply unfreezing certain kinds of feature objects! (I also had to hack in support for retrieval by source, a patch that Lincoln hasn't gotten back to me about yet.) While I can't answer your main questions, I wish you good luck with your project and request that you keep us posted with what you achieve. From bosborne11 at verizon.net Wed Jan 9 09:46:42 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 09 Jan 2008 09:46:42 -0500 Subject: [Bioperl-l] Parsing SwissProt annotation in comment In-Reply-To: <47848619.40109@tagc.univ-mrs.fr> References: <47848619.40109@tagc.univ-mrs.fr> Message-ID: <3DAEDA67-B9A5-47A4-8108-0915659F1052@verizon.net> Samuel, The Feature-Annotation HOWTO addresses this specifically: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Brian O. On Jan 9, 2008, at 3:30 AM, Samuel GRANJEAUD - IR/IFR137 wrote: > Hello, > > I would like to retrieve the human reviewed annotation of SwissProt > entries; these information are in the comment section of the > sequence file. Here is an example: > > CC -!- FUNCTION: Actins are highly conserved proteins that are > involved > CC in various types of cell motility and are ubiquitously > expressed > CC in all eukaryotic cells. > CC -!- SUBUNIT: Polymerization of globular actin (G-actin) leads > to a > CC structural filament (F-actin) in the form of a two-stranded > helix. > CC Each actin can bind to 4 others. Found in a complex with > XPO6, > CC Ran, ACTB and PFN1. Component of a complex composed at > least of > CC ACTB, AP2M1, AP2A1, AP2A2, MEGF10 and VIM. Interacts with > XPO6. > CC -!- INTERACTION: > CC Q00987:MDM2; NbExp=1; IntAct=EBI-353944, EBI-389668; > CC P84022:SMAD3; NbExp=1; IntAct=EBI-353944, EBI-347161; > CC -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton. > > Is there a specific method to do such a job? > > Thanks much, > Samuel > > -- > > Samuel GRANJEAUD granjeau at tagc.univ-mrs.fr > INSERM - ICIM - TAGC Tel: +33 (0)491 82 87 24 > http://tagc.univ-mrs.fr Fax: +33 (0)491 82 87 01 > http://icim.marseille.inserm.fr/proteomique > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From alexanderptok at web.de Wed Jan 9 10:34:56 2008 From: alexanderptok at web.de (Alexander Ptok) Date: Wed, 09 Jan 2008 16:34:56 +0100 Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN] Message-ID: <2011210591@web.de> Hi, I am a beginner to BioPerl and working through the Beginners HOWTO Version of BioPerl is 1.4-1 running on Debian etch In the Howto everything worked fine until the section Retrieving multiple sequences from a database from where i copied the following script: use Bio::DB::GenBank; use Bio::DB::Query::GenBank; $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query ); $gb_obj = Bio::DB::GenBank->new; $stream_obj = $gb_obj->get_Stream_by_query($query_obj); while ($seq_obj = $stream_obj->next_seq) { # do something with the sequence object print $seq_obj->display_id, "\t", $seq_obj->length, "\n"; } If i cut the 0:3000[SLEN] query it works and returns a lot of sequences, when i alter the query to e.g. 1830[SLEN] it finds the one sequence that has the length 1830, but i was not able to query a range of lengths. Please, does anyone know what i am doing wrong. Greetings A. Ptok _________________________________________________________________________ In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten! Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114 From cjm at fruitfly.org Wed Jan 9 11:52:21 2008 From: cjm at fruitfly.org (Chris Mungall) Date: Wed, 9 Jan 2008 08:52:21 -0800 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: References: Message-ID: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org> [cc-d to gmod-schema] Chado does have some views and pg functions for interval-based retrieval. AFAIK there are no accelerators for deep feature graphs, as most chado users have relatively shallow gene-model/SO feature graphs. It may not be so hard to extend cvterm code for doing this, depending on the characteristics of your graphs (the closure of feature neighbourhood graphs may be particularly large) On Jan 9, 2008, at 5:20 AM, Robson Francisco de Souza wrote: > Hello All! > > Greetings for everybody and happy new year for those following an > western calendary! > > I'm starting a new project to store and analyze distinct sets of > sequence annotation data which are related in a way suitable for > representation in a directed (e.g. transcript splicing) or undirected > (e.g. gene product interaction) graph. Analysis will require frequent > queries based on interval overlaps, feature neighbourhood, annotation > and, most importantly, feature relationships and stored paths. > > At first, I thought of build an entire new database structure to store > project specific data (e.g. alternative splicing or protein > interaction), > but as I have some experience with Lincon's > Bio::DB::SeqFeature::Store, I'm now considering extending it for the > purpose of storing graphs describing relationships among features. > > I'm aware that some other bioperl related databases, specifically > BioSQL and Chado, do have components which might be suitable for > storing all or some of these data but, since Lincon's feature storage > and interval binning implementations in > Bio::DB::SeqFeature::Store::mysql are both clean, simple and very > fast, > perhaps extending it in a seemingly modular way is desirable. A good > extension to Lincon's database could include tables like > feature_relationship and feature_path, for edges and transitive > closures (just like in BioSQL) and feature_stored_path, for exclusion > of biologically irrelevant paths in DAGs, like certain splicing > isoforms. These tables could be used to store sequence assemblies or > EST alignments efficiently, including scaffolds inferred by connecting > contigs. > > Before starting, I would like to know if the BioSQL and Chado schemata > do have accelerators for quering intervals among billions of features > and feature relatioships (some examples using these databases would > also help, if they that these databases are efficient for such tasks). > If these or other databases are not as suitable as Bio::DB::SeqFeature > for feature retrieval based on interval overlap and attributes, then > again I might consider extending Bio::DB::seqFeature > and contributing such extensions back to bioperl... > > Any thoughts? > > Best regards, > Robson > > PS: sorry if anyone gets two copies of this post, but took me some > time to realize my new e-mail wasn't subscribed to bioperl-l... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Jan 9 10:00:38 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 9 Jan 2008 09:00:38 -0600 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: <4784D32C.9070807@sendu.me.uk> References: <4784D32C.9070807@sendu.me.uk> Message-ID: On Jan 9, 2008, at 7:59 AM, Sendu Bala wrote: > Robson Francisco de Souza wrote: >> Before starting, I would like to know if the BioSQL and Chado >> schemata >> do have accelerators for quering intervals among billions of features >> and feature relatioships (some examples using these databases would >> also help, if they that these databases are efficient for such >> tasks). >> If these or other databases are not as suitable as >> Bio::DB::SeqFeature >> for feature retrieval based on interval overlap and attributes, > > I'm using Bio::DB::SeqFeature for that purpose, but just a warning: > I found that with millions of features it made a db that was too > large in terms of disc space and too slow in terms of query time. I > had to hack out its storage of feature objects in the db, instead > generating feature objects on request from the stored attributes. > Doing this turned out to be faster than simply unfreezing certain > kinds of feature objects! Would this be Bio::SF::Annotated objects? If so I bet Storable is storing the OntologyStore object information along with the SF (which argues for refactoring the FeatureIO/Bio::SF::Annotated stuff in 1.7). Not sure what can be done about that beyond your hack, though it might be worth exploring whether one can optionally set the DB::Store to store the object instance. > (I also had to hack in support for retrieval by source, a patch that > Lincoln hasn't gotten back to me about yet.) > > While I can't answer your main questions, I wish you good luck with > your project and request that you keep us posted with what you > achieve. You can always try Lincoln on the GBrowse list as well. I would say go ahead and commit the patch if it isn't a big deal. chris From cjfields at uiuc.edu Wed Jan 9 13:12:55 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 9 Jan 2008 12:12:55 -0600 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: References: Message-ID: <128517E8-3A2A-45DD-83A0-0014863A25BC@uiuc.edu> cc'ing the gbrowse list in case Lincoln hasn't seen this. I believe the primary intent for Bio::DB::SeqFeature::Store was as a more GFF3-compatible replacement for Bio::DB::GFF (unlimited feature nesting, uses any SeqFeatureI, etc) and was streamlined for faster lookups by GBrowse. I don't think adding tables would affect performance dramatically, though maybe Lincoln would have a better idea. chris On Jan 9, 2008, at 7:20 AM, Robson Francisco de Souza wrote: > Hello All! > > Greetings for everybody and happy new year for those following an > western calendary! > > I'm starting a new project to store and analyze distinct sets of > sequence annotation data which are related in a way suitable for > representation in a directed (e.g. transcript splicing) or undirected > (e.g. gene product interaction) graph. Analysis will require frequent > queries based on interval overlaps, feature neighbourhood, annotation > and, most importantly, feature relationships and stored paths. > > At first, I thought of build an entire new database structure to store > project specific data (e.g. alternative splicing or protein > interaction), > but as I have some experience with Lincon's > Bio::DB::SeqFeature::Store, I'm now considering extending it for the > purpose of storing graphs describing relationships among features. > > I'm aware that some other bioperl related databases, specifically > BioSQL and Chado, do have components which might be suitable for > storing all or some of these data but, since Lincon's feature storage > and interval binning implementations in > Bio::DB::SeqFeature::Store::mysql are both clean, simple and very > fast, > perhaps extending it in a seemingly modular way is desirable. A good > extension to Lincon's database could include tables like > feature_relationship and feature_path, for edges and transitive > closures (just like in BioSQL) and feature_stored_path, for exclusion > of biologically irrelevant paths in DAGs, like certain splicing > isoforms. These tables could be used to store sequence assemblies or > EST alignments efficiently, including scaffolds inferred by connecting > contigs. > > Before starting, I would like to know if the BioSQL and Chado schemata > do have accelerators for quering intervals among billions of features > and feature relatioships (some examples using these databases would > also help, if they that these databases are efficient for such tasks). > If these or other databases are not as suitable as Bio::DB::SeqFeature > for feature retrieval based on interval overlap and attributes, then > again I might consider extending Bio::DB::seqFeature > and contributing such extensions back to bioperl... > > Any thoughts? > > Best regards, > Robson > > PS: sorry if anyone gets two copies of this post, but took me some > time to realize my new e-mail wasn't subscribed to bioperl-l... > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bosborne11 at verizon.net Wed Jan 9 13:29:15 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 09 Jan 2008 13:29:15 -0500 Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN] In-Reply-To: <2011210591@web.de> References: <2011210591@web.de> Message-ID: <0EB96131-7931-4FC3-802F-A8152B474A99@verizon.net> Alexander, I don't understand. By using the clause "0:3000[SLEN] " you are querying for sequences in the length range of 0 to 3000. Brian O. On Jan 9, 2008, at 10:34 AM, Alexander Ptok wrote: > If i cut the 0:3000[SLEN] query it works and returns a lot of > sequences, when i alter the query to e.g. 1830[SLEN] it > finds the one sequence that has the length 1830, but i was not able > to query a range of lengths. From stefan.kirov at bms.com Wed Jan 9 14:54:07 2008 From: stefan.kirov at bms.com (Stefan Kirov) Date: Wed, 09 Jan 2008 14:54:07 -0500 Subject: [Bioperl-l] pairwise_kaks.PLS: verbose rquired by PAML Message-ID: <4785265F.6020500@bms.com> Jason, Even this last fix I still had problems with bp_pairwise_kaks.pl. It turns out, verbose needs to be set on by default for codeml in order for the sequences to appear in mlc file.\ That being said, we need instead of: $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new (-verbose => $verbose, -params => { 'runmode' => -2, 'seqtype' => 1, } ); this: $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new (-verbose => $verbose, -params => { 'runmode' => -2, 'seqtype' => 1, 'verbose' => 1, } ); verbose can 2 as well.... Just got this clarification from Ziheng. He also offers to change the output so it becomes easier for us. I plan to ask him to put the sequence in the mlc header by default. Stefan From robfsouza at gmail.com Wed Jan 9 19:28:25 2008 From: robfsouza at gmail.com (Robson Francisco de Souza) Date: Wed, 9 Jan 2008 22:28:25 -0200 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org> References: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org> Message-ID: Hi, 2008/1/9, Chris Mungall : > [cc-d to gmod-schema] > > Chado does have some views and pg functions for interval-based > retrieval. AFAIK there are no accelerators for deep feature graphs, > as most chado users have relatively shallow gene-model/SO feature > graphs. It may not be so hard to extend cvterm code for doing this, > depending on the characteristics of your graphs (the closure of > feature neighbourhood graphs may be particularly large) Great! I'm studing Chado and I will have a look at the interval optimizations. Did any of you compared BioSQL and Chado for huge feature and feature graph storage/retrieval efficiency? As Sendu pointed to limitations in Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms (or maybe another one?) would be best suited for these tasks... for the moment, I will either extend Sendu's hack of Lincon's modules or adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to Chado, if it turns out to be more efficient than the pg functions. Best, Robson PS: I could not find the most recent version of gmod by following the Download link to gmod(Chado) from GMOD's site to the Sourceforge download page. Did I miss the right link on the download site or is this unexpected? Is the version available at IUBio's mirror (0.003-10) the most recent one? From cain.cshl at gmail.com Wed Jan 9 22:15:29 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Wed, 09 Jan 2008 22:15:29 -0500 Subject: [Bioperl-l] bioperl based database infrastucture for directed graphs In-Reply-To: References: <199B572E-6EEE-4F6D-9FE9-952F766E911E@fruitfly.org> Message-ID: <1199934929.6229.44.camel@frissell> Hi Robson, I seem to be perennially working on the 1.0 release of Chado. The schema itself is quite stable but I'm always working on the tools to make them handle more cases and be as stable as possible. For the time being, you need to get Chado from cvs; see http://www.gmod.org/wiki/index.php/Chado_-_Getting_Started#Chado_From_CVS I removed the 0.003 release from the SourceForge site because the schema in it is out of date relative to what we've been working on for the last year. Scott On Wed, 2008-01-09 at 22:28 -0200, Robson Francisco de Souza wrote: > Hi, > > 2008/1/9, Chris Mungall : > > [cc-d to gmod-schema] > > > > Chado does have some views and pg functions for interval-based > > retrieval. AFAIK there are no accelerators for deep feature graphs, > > as most chado users have relatively shallow gene-model/SO feature > > graphs. It may not be so hard to extend cvterm code for doing this, > > depending on the characteristics of your graphs (the closure of > > feature neighbourhood graphs may be particularly large) > > Great! I'm studing Chado and I will have a look at the interval optimizations. > Did any of you compared BioSQL and Chado for huge feature and feature > graph storage/retrieval efficiency? As Sendu pointed to limitations in > Bio::DB::SeqFeature's schema, I'm thinking which of these plataforms > (or maybe another one?) would be best suited for these tasks... for > the moment, I will either extend Sendu's hack of Lincon's modules or > adapt the binning algorithm of Bio::DB::SeqFeature::DBI::mysql to > Chado, if it turns out to be more efficient than the pg functions. > > Best, > Robson > > PS: I could not find the most recent version of gmod by following the > Download link to gmod(Chado) from GMOD's site to the Sourceforge > download page. Did I miss the right link on the download site or is > this unexpected? Is the version available at IUBio's mirror (0.003-10) > the most recent one? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From bosborne11 at verizon.net Thu Jan 10 09:16:16 2008 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 10 Jan 2008 09:16:16 -0500 Subject: [Bioperl-l] Beginners HOWTO query a range of lengths 0:3000[SLEN] In-Reply-To: <2013325230@web.de> References: <2013325230@web.de> Message-ID: <932550FF-8414-4B3E-92BB-1895FD9658AE@verizon.net> Alexander, OK, that is odd (meaning, this did work a while back but it's not clear to me what could have changed). First thing to do, upgrade to Bioperl version 1.52. Can you do this? Version 1.4 is very old and you could run into other problems using it. Brian O. On Jan 10, 2008, at 8:54 AM, Alexander Ptok wrote: > Hallo Brian, > > thanks for your answer. The principle is clear, but it doesn't work > like it should, on my computer. So maybe i should repeat what i did > step by step. > > 1. i took the following script: > > use Bio::DB::GenBank; > use Bio::DB::Query::GenBank; > > $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]"; > $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', - > query => $query ); > > $gb_obj = Bio::DB::GenBank->new; > > $stream_obj = $gb_obj->get_Stream_by_query($query_obj); > > while ($seq_obj = $stream_obj->next_seq) { > # do something with the sequence object > print $seq_obj->display_id, "\t", $seq_obj->length, "\n"; > } > > and then on the terminal > > sv1494 at r04102:~/Desktop/bioperl$ perl script1.pl > sv1494 at r04102:~/Desktop/bioperl$ > > 2. i took out the 0:3000[SLEN]: > > $query = "Arabidopsis[ORGN] AND topoisomerase[TITL]"; > > and then on the terminal > > sv1494 at r04102:~/Desktop/bioperl$ perl script2.pl > NM_128760 2775 > NM_125788 2874 > NM_124913 3068 > NM_124912 3117 > NM_124775 871 > NM_120360 1655 > NM_111862 2199 > NM_001036386 2734 > NM_119270 3996 > NM_105072 1656 > NM_113294 4824 > NM_180431 1673 > NM_120495 2515 > NM_120493 2050 > NM_112156 1089 > . > . > and a lot more of hits, and one can clearly see, there are some with > a lenght between 0 and 3000 > > 3. to have a look at the [SLEN] i tried another script with e.g. > 2199[SLEN] > > $query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 2199[SLEN]"; > > on the terminal: > > sv1494 at r04102:~/Desktop/bioperl$ perl script3.pl > NM_111862 2199 > sv1494 at r04102:~/Desktop/bioperl$ > > > > It think everthing works fine, except that bioperl or maybe the > genbank doesn't understand > the range clause 0:3000, but in every documentation says i have to > do it that way. Did > i misunterstand something or is it just a problem of my computer/ > bioperl installation? > Maybe you can tell me if the script does what it is suppose to do on > your computer? > > Thanks and greetings > > Alexander Ptok >> >> Alexander, >> >> I don't understand. By using the clause "0:3000[SLEN] " you are >> querying for sequences in the length range of 0 to 3000. >> > > > _______________________________________________________________________ > Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 30 Tage > kostenlos testen. http://www.pc-sicherheit.web.de/startseite/? > mc=022220 > From pmiguel at purdue.edu Fri Jan 11 11:22:38 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 11 Jan 2008 11:22:38 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? Message-ID: <478797CE.9050202@purdue.edu> No problem getting sequence from genbank via a myriad of methods. But as the volume of non-finished sequence in genbank increases the importance of also obtaining quality values for a given sequence increases. Some records include quality values. I typically use bp_fetch.pl to grab a sequence from genbank: bp_fetch.pl -fmt fasta net::genbank:AC207960 sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't designed to pull down quals evidently: bp_fetch.pl -fmt qual net::genbank:AC207960 gives: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual object to write_seq() as a parameter named "source" STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::SeqIO::qual::write_seq /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205 STACK: /usr/local/perl/bin/bp_fetch.pl:313 ----------------------------------------------------------- (running under bioperl 1.5.2) The quality values for this accession are in genbank as these URLs demonstrate: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual What is the best way to pull down these qual values? They aren't present in "GenBank(Full)" format. They are present in an ASN.1 format. Advice would be appreciated. -- Phillip Purdue Genomics Core Facility From cjfields at uiuc.edu Fri Jan 11 12:09:40 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 11 Jan 2008 11:09:40 -0600 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: <478797CE.9050202@purdue.edu> References: <478797CE.9050202@purdue.edu> Message-ID: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> I don't think this is possible with the current setup for Bio::DB::GenBank (which the script uses). We'll have to investigate whether it is possible to retrieve this data via NCBI's eutils; if so we can try adding it in. If you want you can submit this as an enhancement request via bugzilla for tracking: http://bugzilla.open-bio.org/ chris On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > No problem getting sequence from genbank via a myriad of methods. > But as the volume of non-finished sequence in genbank increases the > importance of also obtaining quality values for a given sequence > increases. Some records include quality values. > > I typically use bp_fetch.pl to grab a sequence from genbank: > > bp_fetch.pl -fmt fasta net::genbank:AC207960 > > sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't > designed to pull down quals evidently: > > bp_fetch.pl -fmt qual net::genbank:AC207960 > > gives: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual > object to write_seq() as a parameter named "source" > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ > 5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::SeqIO::qual::write_seq /usr/local/perl_5.8/lib/site_perl/ > 5.8.8/Bio/SeqIO/qual.pm:205 > STACK: /usr/local/perl/bin/bp_fetch.pl:313 > ----------------------------------------------------------- > > (running under bioperl 1.5.2) > > The quality values for this accession are in genbank as these URLs > demonstrate: > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual > > What is the best way to pull down these qual values? They aren't > present in "GenBank(Full)" format. They are present in an ASN.1 > format. > > Advice would be appreciated. > > -- > Phillip > Purdue Genomics Core Facility > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From MEC at stowers-institute.org Fri Jan 11 14:14:10 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 11 Jan 2008 13:14:10 -0600 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> Message-ID: Indeed eutil is capable of this The following use of my ncbi_eutil (attached) script yeilds what you want: ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > AC207960.qual It depends on the version of NCBI_PowerScripting.pm , such as is included in Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Friday, January 11, 2008 11:10 AM > To: Phillip San Miguel > Cc: bioperl-l > Subject: Re: [Bioperl-l] Recommended way to download qual > files from Genbank? > > I don't think this is possible with the current setup for > Bio::DB::GenBank (which the script uses). We'll have to > investigate whether it is possible to retrieve this data via > NCBI's eutils; if so we can try adding it in. If you want > you can submit this as an enhancement request via bugzilla > for tracking: > > http://bugzilla.open-bio.org/ > > chris > > On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > > > No problem getting sequence from genbank via a myriad of methods. > > But as the volume of non-finished sequence in genbank increases the > > importance of also obtaining quality values for a given sequence > > increases. Some records include quality values. > > > > I typically use bp_fetch.pl to grab a sequence from genbank: > > > > bp_fetch.pl -fmt fasta net::genbank:AC207960 > > > > sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't > > designed to pull down quals evidently: > > > > bp_fetch.pl -fmt qual net::genbank:AC207960 > > > > gives: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual > > object to write_seq() as a parameter named "source" > > STACK: Error::throw > > STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ > > 5.8.8/Bio/Root/Root.pm:359 > > STACK: Bio::SeqIO::qual::write_seq > /usr/local/perl_5.8/lib/site_perl/ > > 5.8.8/Bio/SeqIO/qual.pm:205 > > STACK: /usr/local/perl/bin/bp_fetch.pl:313 > > ----------------------------------------------------------- > > > > (running under bioperl 1.5.2) > > > > The quality values for this accession are in genbank as these URLs > > demonstrate: > > > > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 > > > > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15 > > 4937460&dopt=fasta > > > > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15 > > 4937460&dopt=qual > > > > What is the best way to pull down these qual values? They aren't > > present in "GenBank(Full)" format. They are present in an ASN.1 > > format. > > > > Advice would be appreciated. > > > > -- > > Phillip > > Purdue Genomics Core Facility > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From pmiguel at purdue.edu Fri Jan 11 14:33:13 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 11 Jan 2008 14:33:13 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> Message-ID: <4787C479.8070600@purdue.edu> Hi Malcolm, Looks like your email was (inadvertantly?) redacted in some way. (No attachment and last sentence truncated.) Would it be possible to get a complete version so I can be sure I'm following you? Thanks, Phillip Cook, Malcolm wrote: > Indeed eutil is capable of this > > The following use of my ncbi_eutil (attached) script yeilds what you > want: > > ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > > AC207960.qual > > It depends on the version of NCBI_PowerScripting.pm , such as is > included in > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Chris Fields >> Sent: Friday, January 11, 2008 11:10 AM >> To: Phillip San Miguel >> Cc: bioperl-l >> Subject: Re: [Bioperl-l] Recommended way to download qual >> files from Genbank? >> >> I don't think this is possible with the current setup for >> Bio::DB::GenBank (which the script uses). We'll have to >> investigate whether it is possible to retrieve this data via >> NCBI's eutils; if so we can try adding it in. If you want >> you can submit this as an enhancement request via bugzilla >> for tracking: >> >> http://bugzilla.open-bio.org/ >> >> chris >> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: >> >> >>> No problem getting sequence from genbank via a myriad of methods. >>> But as the volume of non-finished sequence in genbank increases the >>> importance of also obtaining quality values for a given sequence >>> increases. Some records include quality values. >>> >>> I typically use bp_fetch.pl to grab a sequence from genbank: >>> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960 >>> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't >>> designed to pull down quals evidently: >>> >>> bp_fetch.pl -fmt qual net::genbank:AC207960 >>> >>> gives: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual >>> object to write_seq() as a parameter named "source" >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ >>> 5.8.8/Bio/Root/Root.pm:359 >>> STACK: Bio::SeqIO::qual::write_seq >>> >> /usr/local/perl_5.8/lib/site_perl/ >> >>> 5.8.8/Bio/SeqIO/qual.pm:205 >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313 >>> ----------------------------------------------------------- >>> >>> (running under bioperl 1.5.2) >>> >>> The quality values for this accession are in genbank as these URLs >>> demonstrate: >>> >>> >>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 >> >>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15 >> >>> 4937460&dopt=fasta >>> >>> >>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=15 >> >>> 4937460&dopt=qual >>> >>> What is the best way to pull down these qual values? They aren't >>> present in "GenBank(Full)" format. They are present in an ASN.1 >>> format. >>> >>> Advice would be appreciated. >>> >>> -- >>> Phillip >>> Purdue Genomics Core Facility >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > From pmiguel at purdue.edu Fri Jan 11 14:37:24 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 11 Jan 2008 14:37:24 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> Message-ID: <4787C574.8020003@purdue.edu> Hi Chris, Thanks. I have submitted this as an enhancement request to bugzilla. Phillip Chris Fields wrote: > I don't think this is possible with the current setup for > Bio::DB::GenBank (which the script uses). We'll have to investigate > whether it is possible to retrieve this data via NCBI's eutils; if so > we can try adding it in. If you want you can submit this as an > enhancement request via bugzilla for tracking: > > http://bugzilla.open-bio.org/ > > chris > > On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > >> No problem getting sequence from genbank via a myriad of methods. But >> as the volume of non-finished sequence in genbank increases the >> importance of also obtaining quality values for a given sequence >> increases. Some records include quality values. >> >> I typically use bp_fetch.pl to grab a sequence from genbank: >> >> bp_fetch.pl -fmt fasta net::genbank:AC207960 >> >> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't >> designed to pull down quals evidently: >> >> bp_fetch.pl -fmt qual net::genbank:AC207960 >> >> gives: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual >> object to write_seq() as a parameter named "source" >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/Root/Root.pm:359 >> STACK: Bio::SeqIO::qual::write_seq >> /usr/local/perl_5.8/lib/site_perl/5.8.8/Bio/SeqIO/qual.pm:205 >> STACK: /usr/local/perl/bin/bp_fetch.pl:313 >> ----------------------------------------------------------- >> >> (running under bioperl 1.5.2) >> >> The quality values for this accession are in genbank as these URLs >> demonstrate: >> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=154937460 >> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=fasta >> >> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=154937460&dopt=qual >> >> >> What is the best way to pull down these qual values? They aren't >> present in "GenBank(Full)" format. They are present in an ASN.1 format. >> >> Advice would be appreciated. >> >> -- >> Phillip >> Purdue Genomics Core Facility >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From pmiguel at purdue.edu Fri Jan 11 15:46:59 2008 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Fri, 11 Jan 2008 15:46:59 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> <4787C479.8070600@purdue.edu> Message-ID: <4787D5C3.1030308@purdue.edu> Hi Malcolm, Yes that works great! Well, one caveat: If you download both the fasta and the qual files: ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=fasta > AC207960.fasta ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > AC207960.fasta.qual The "primary IDs" don't match. The fasta comes out: >gi|154937460|gb|AC207960.1| and the qual comes out: >AC207960.1 which seems to choke most programs that use seq and qual (eg cross_match) because they want the primary IDs of the seq and qual files to match. Otherwise fine, though. Thanks, Phillip Cook, Malcolm wrote: > Phillip: > > Of course - mea culpa - here's the full monty.... > > Indeed NCBI's eutils can do this: > > >> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > >> > AC207960.qual > > which uses my script (attached) to wrap NCBI's eutils. > > It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip > by NCBI in their "Jul 24-27, 2007" course found at > http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html > > I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the > very beginning so that trace messages are not printed on STDOUT, such as > this echoed header: > Retrieving 1 records from nucleotide... > ... and footer: > Received records 1 - 1. > Wrote data to -. > > (otherwise they are interspersed with downloaded qual files) > > It also depends on recent version of GetOpt::Long. > > Hope it helps. > > Malcolm Cook > Database Applications Manager - Bioinformatics > Stowers Institute for Medical Research - Kansas City, Missouri > > > >> -----Original Message----- >> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] >> Sent: Friday, January 11, 2008 1:33 PM >> To: Cook, Malcolm >> Cc: Chris Fields; bioperl-l >> Subject: Re: [Bioperl-l] Recommended way to download qual >> files from Genbank? >> >> Hi Malcolm, >> Looks like your email was (inadvertantly?) redacted in >> some way. (No attachment and last sentence truncated.) Would >> it be possible to get a complete version so I can be sure I'm >> following you? >> Thanks, >> Phillip >> >> Cook, Malcolm wrote: >> >>> Indeed eutil is capable of this >>> >>> The following use of my ncbi_eutil (attached) script yeilds what you >>> want: >>> >>> ncbi_eutil -search db=nucleotide term=AC207960 -fetch >>> >> rettype=qual > >> >>> AC207960.qual >>> >>> It depends on the version of NCBI_PowerScripting.pm , such as is >>> included in >>> >>> Malcolm Cook >>> Database Applications Manager - Bioinformatics Stowers >>> >> Institute for >> >>> Medical Research - Kansas City, Missouri >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris >>>> Fields >>>> Sent: Friday, January 11, 2008 11:10 AM >>>> To: Phillip San Miguel >>>> Cc: bioperl-l >>>> Subject: Re: [Bioperl-l] Recommended way to download qual >>>> >> files from >> >>>> Genbank? >>>> >>>> I don't think this is possible with the current setup for >>>> Bio::DB::GenBank (which the script uses). We'll have to >>>> >> investigate >> >>>> whether it is possible to retrieve this data via NCBI's >>>> >> eutils; if so >> >>>> we can try adding it in. If you want you can submit this as an >>>> enhancement request via bugzilla for tracking: >>>> >>>> http://bugzilla.open-bio.org/ >>>> >>>> chris >>>> >>>> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: >>>> >>>> >>>> >>>>> No problem getting sequence from genbank via a myriad of >>>>> >> methods. >> >>>>> But as the volume of non-finished sequence in genbank >>>>> >> increases the >> >>>>> importance of also obtaining quality values for a given sequence >>>>> increases. Some records include quality values. >>>>> >>>>> I typically use bp_fetch.pl to grab a sequence from genbank: >>>>> >>>>> bp_fetch.pl -fmt fasta net::genbank:AC207960 >>>>> >>>>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't >>>>> designed to pull down quals evidently: >>>>> >>>>> bp_fetch.pl -fmt qual net::genbank:AC207960 >>>>> >>>>> gives: >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual >>>>> object to write_seq() as a parameter named "source" >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ >>>>> 5.8.8/Bio/Root/Root.pm:359 >>>>> STACK: Bio::SeqIO::qual::write_seq >>>>> >>>>> >>>> /usr/local/perl_5.8/lib/site_perl/ >>>> >>>> >>>>> 5.8.8/Bio/SeqIO/qual.pm:205 >>>>> STACK: /usr/local/perl/bin/bp_fetch.pl:313 >>>>> ----------------------------------------------------------- >>>>> >>>>> (running under bioperl 1.5.2) >>>>> >>>>> The quality values for this accession are in genbank as these URLs >>>>> demonstrate: >>>>> >>>>> >>>>> >>>>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746 >> >>>> 0 >>>> >>>> >>>>> >>>>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 >> >>>> 5 >>>> >>>> >>>>> 4937460&dopt=fasta >>>>> >>>>> >>>>> >>>>> >> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 >> >>>> 5 >>>> >>>> >>>>> 4937460&dopt=qual >>>>> >>>>> What is the best way to pull down these qual values? They aren't >>>>> present in "GenBank(Full)" format. They are present in an ASN.1 >>>>> format. >>>>> >>>>> Advice would be appreciated. >>>>> >>>>> -- >>>>> Phillip >>>>> Purdue Genomics Core Facility >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> >>> >> >> From MEC at stowers-institute.org Fri Jan 11 14:40:14 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 11 Jan 2008 13:40:14 -0600 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? In-Reply-To: <4787C479.8070600@purdue.edu> References: <478797CE.9050202@purdue.edu> <14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu> <4787C479.8070600@purdue.edu> Message-ID: Phillip: Of course - mea culpa - here's the full monty.... Indeed NCBI's eutils can do this: > ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > AC207960.qual which uses my script (attached) to wrap NCBI's eutils. It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip by NCBI in their "Jul 24-27, 2007" course found at http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the very beginning so that trace messages are not printed on STDOUT, such as this echoed header: Retrieving 1 records from nucleotide... ... and footer: Received records 1 - 1. Wrote data to -. (otherwise they are interspersed with downloaded qual files) It also depends on recent version of GetOpt::Long. Hope it helps. Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Phillip San Miguel [mailto:pmiguel at purdue.edu] > Sent: Friday, January 11, 2008 1:33 PM > To: Cook, Malcolm > Cc: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] Recommended way to download qual > files from Genbank? > > Hi Malcolm, > Looks like your email was (inadvertantly?) redacted in > some way. (No attachment and last sentence truncated.) Would > it be possible to get a complete version so I can be sure I'm > following you? > Thanks, > Phillip > > Cook, Malcolm wrote: > > Indeed eutil is capable of this > > > > The following use of my ncbi_eutil (attached) script yeilds what you > > want: > > > > ncbi_eutil -search db=nucleotide term=AC207960 -fetch > rettype=qual > > > AC207960.qual > > > > It depends on the version of NCBI_PowerScripting.pm , such as is > > included in > > > > Malcolm Cook > > Database Applications Manager - Bioinformatics Stowers > Institute for > > Medical Research - Kansas City, Missouri > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris > >> Fields > >> Sent: Friday, January 11, 2008 11:10 AM > >> To: Phillip San Miguel > >> Cc: bioperl-l > >> Subject: Re: [Bioperl-l] Recommended way to download qual > files from > >> Genbank? > >> > >> I don't think this is possible with the current setup for > >> Bio::DB::GenBank (which the script uses). We'll have to > investigate > >> whether it is possible to retrieve this data via NCBI's > eutils; if so > >> we can try adding it in. If you want you can submit this as an > >> enhancement request via bugzilla for tracking: > >> > >> http://bugzilla.open-bio.org/ > >> > >> chris > >> > >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > >> > >> > >>> No problem getting sequence from genbank via a myriad of > methods. > >>> But as the volume of non-finished sequence in genbank > increases the > >>> importance of also obtaining quality values for a given sequence > >>> increases. Some records include quality values. > >>> > >>> I typically use bp_fetch.pl to grab a sequence from genbank: > >>> > >>> bp_fetch.pl -fmt fasta net::genbank:AC207960 > >>> > >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't > >>> designed to pull down quals evidently: > >>> > >>> bp_fetch.pl -fmt qual net::genbank:AC207960 > >>> > >>> gives: > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual > >>> object to write_seq() as a parameter named "source" > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ > >>> 5.8.8/Bio/Root/Root.pm:359 > >>> STACK: Bio::SeqIO::qual::write_seq > >>> > >> /usr/local/perl_5.8/lib/site_perl/ > >> > >>> 5.8.8/Bio/SeqIO/qual.pm:205 > >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313 > >>> ----------------------------------------------------------- > >>> > >>> (running under bioperl 1.5.2) > >>> > >>> The quality values for this accession are in genbank as these URLs > >>> demonstrate: > >>> > >>> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746 > >> 0 > >> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 > >> 5 > >> > >>> 4937460&dopt=fasta > >>> > >>> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 > >> 5 > >> > >>> 4937460&dopt=qual > >>> > >>> What is the best way to pull down these qual values? They aren't > >>> present in "GenBank(Full)" format. They are present in an ASN.1 > >>> format. > >>> > >>> Advice would be appreciated. > >>> > >>> -- > >>> Phillip > >>> Purdue Genomics Core Facility > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: ncbi_eutil Type: application/octet-stream Size: 1854 bytes Desc: ncbi_eutil URL: From cain.cshl at gmail.com Mon Jan 14 13:46:39 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 14 Jan 2008 13:46:39 -0500 Subject: [Bioperl-l] GenBank format and feature names > 15 char Message-ID: <1200336399.6056.12.camel@frissell> Hi all, Last month, I got a bug report on the GBrowse bug tracker: http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 about a problem with dumping invalid GenBank files. GBrowse uses Bio::SeqIO::genbank to create these dumps. In his bug report, he claims that feature names over 15 characters long are invalid, and provided and example GenBank file where a feature is named 'BAC_cloned_genomic_insert', which is over 15 characters. What I want to know is this: is this truly a restriction on the GenBank format, or is it a software problem with some other package? Do we need to fix genbank.pm? I'm perfectly willing to do it; I'm just hesitant to believe this is really a bug. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From lstein at cshl.edu Mon Jan 14 13:53:15 2008 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 14 Jan 2008 13:53:15 -0500 Subject: [Bioperl-l] GenBank format and feature names > 15 char In-Reply-To: <1200336399.6056.12.camel@frissell> References: <1200336399.6056.12.camel@frissell> Message-ID: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> Hi Scott, He is correct about the limitation, but we deliberately relaxed it because we were running into situations where we lost information during roundtripping from other formats into genbank. Lincoln On Jan 14, 2008 1:46 PM, Scott Cain wrote: > Hi all, > > Last month, I got a bug report on the GBrowse bug tracker: > > > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 > > about a problem with dumping invalid GenBank files. GBrowse uses > Bio::SeqIO::genbank to create these dumps. > > In his bug report, he claims that feature names over 15 characters long > are invalid, and provided and example GenBank file where a feature is > named 'BAC_cloned_genomic_insert', which is over 15 characters. What I > want to know is this: is this truly a restriction on the GenBank format, > or is it a software problem with some other package? Do we need to fix > genbank.pm? I'm perfectly willing to do it; I'm just hesitant to > believe this is really a bug. > > Thanks, > Scott > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain.cshl at gmail.com > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Mon Jan 14 14:35:46 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 14 Jan 2008 13:35:46 -0600 Subject: [Bioperl-l] GenBank format and feature names > 15 char In-Reply-To: <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> References: <1200336399.6056.12.camel@frissell> <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> Message-ID: It looks like the keys in the feature table run into the location string w/o intervening space, which would probably cause havoc with roundtripping from this output. A few examples: BAC_cloned_genomic_insert<1..>1000 combined_genscanjoin(<1..347,400..498,794..>1000) splign_na_dbEST_ncbi<1..>1000 I would think at least a space in between the location and the key would be required for round-tripping out of genbank format. chris On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote: > Hi Scott, > > He is correct about the limitation, but we deliberately relaxed it > because > we were running into situations where we lost information during > roundtripping from other formats into genbank. > > Lincoln > > On Jan 14, 2008 1:46 PM, Scott Cain wrote: > >> Hi all, >> >> Last month, I got a bug report on the GBrowse bug tracker: >> >> >> http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 >> >> about a problem with dumping invalid GenBank files. GBrowse uses >> Bio::SeqIO::genbank to create these dumps. >> >> In his bug report, he claims that feature names over 15 characters >> long >> are invalid, and provided and example GenBank file where a feature is >> named 'BAC_cloned_genomic_insert', which is over 15 characters. >> What I >> want to know is this: is this truly a restriction on the GenBank >> format, >> or is it a software problem with some other package? Do we need to >> fix >> genbank.pm? I'm perfectly willing to do it; I'm just hesitant to >> believe this is really a bug. >> >> Thanks, >> Scott >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. cain.cshl at gmail.com >> GMOD Coordinator (http://www.gmod.org/) >> 216-392-3087 >> Cold Spring Harbor Laboratory >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From lstein at cshl.edu Mon Jan 14 14:46:20 2008 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 14 Jan 2008 14:46:20 -0500 Subject: [Bioperl-l] GenBank format and feature names > 15 char In-Reply-To: References: <1200336399.6056.12.camel@frissell> <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> Message-ID: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com> That's a new bug. The version I worked on inserted a space after the name. Lincoln On Jan 14, 2008 2:35 PM, Chris Fields wrote: > It looks like the keys in the feature table run into the location > string w/o intervening space, which would probably cause havoc with > roundtripping from this output. A few examples: > > BAC_cloned_genomic_insert<1..>1000 > combined_genscanjoin(<1..347,400..498,794..>1000) > splign_na_dbEST_ncbi<1..>1000 > > I would think at least a space in between the location and the key > would be required for round-tripping out of genbank format. > > chris > > On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote: > > > Hi Scott, > > > > He is correct about the limitation, but we deliberately relaxed it > > because > > we were running into situations where we lost information during > > roundtripping from other formats into genbank. > > > > Lincoln > > > > On Jan 14, 2008 1:46 PM, Scott Cain wrote: > > > >> Hi all, > >> > >> Last month, I got a bug report on the GBrowse bug tracker: > >> > >> > >> > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 > >> > >> about a problem with dumping invalid GenBank files. GBrowse uses > >> Bio::SeqIO::genbank to create these dumps. > >> > >> In his bug report, he claims that feature names over 15 characters > >> long > >> are invalid, and provided and example GenBank file where a feature is > >> named 'BAC_cloned_genomic_insert', which is over 15 characters. > >> What I > >> want to know is this: is this truly a restriction on the GenBank > >> format, > >> or is it a software problem with some other package? Do we need to > >> fix > >> genbank.pm? I'm perfectly willing to do it; I'm just hesitant to > >> believe this is really a bug. > >> > >> Thanks, > >> Scott > >> > >> -- > >> > ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. > cain.cshl at gmail.com > >> GMOD Coordinator (http://www.gmod.org/) > >> 216-392-3087 > >> Cold Spring Harbor Laboratory > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > > > > > -- > > Lincoln D. Stein > > Cold Spring Harbor Laboratory > > 1 Bungtown Road > > Cold Spring Harbor, NY 11724 > > (516) 367-8380 (voice) > > (516) 367-8389 (fax) > > FOR URGENT MESSAGES & SCHEDULING, > > PLEASE CONTACT MY ASSISTANT, > > SANDRA MICHELSEN, AT michelse at cshl.edu > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From diogoat at gmail.com Tue Jan 15 08:40:10 2008 From: diogoat at gmail.com (Diogo Tschoeke) Date: Tue, 15 Jan 2008 11:40:10 -0200 Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS Message-ID: <638512560801150540m108db442r227d82c709a954@mail.gmail.com> Hello, I want to extract protein_id and transcript from a CDS tag, from genome in genbak format but i have one problem, when the sequence in the file don't have the protein_id or the transcript the script gives me this error: ------------- EXCEPTION ------------- MSG: asking for tag value that does not exist protein_id STACK Bio::SeqFeature::Generic::get_tag_values /usr/share/perl5/Bio/SeqFeature/Generic.pm:504 STACK toplevel parser_cds.pl:25 -------------------------------------- Bellow I past the script ############################################## use Bio::SeqIO; use warnings; my $infile = $ARGV[0]; my $outfile = "$infile.out"; open (OUT, ">>$outfile"); my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => 'Genbank'); while (my $inseq = $seq_in->next_seq) { for my $feat_object ($inseq->get_SeqFeatures){ if ($feat_object->primary_tag eq "CDS"){ print OUT $feat_object->get_tag_values('protein_id')," "; print OUT $feat_object->get_tag_values('translation'),"\n"; } } } ############################################### Somebody can helps me? Thank Diogo Tschoeke From Marc.Logghe at ablynx.com Tue Jan 15 09:44:54 2008 From: Marc.Logghe at ablynx.com (Marc Logghe) Date: Tue, 15 Jan 2008 15:44:54 +0100 Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS In-Reply-To: <638512560801150540m108db442r227d82c709a954@mail.gmail.com> Message-ID: <03C512635899144083CADB0EE2220189013E2BEC@alpaca.lan.ablynx.com> Hi, Try testing for existence first using the has_tag() method. It is provided by Bio::AnnotatableI. print OUT $feat_object->get_tag_values('protein_id')," " if ($feat->has_tag('protein_id')); HTH, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Diogo Tschoeke > Sent: dinsdag 15 januari 2008 14:40 > To: Bioperl-list > Subject: [Bioperl-l] Problem to extract protein_id and transcript from CDS > > Hello, > > I want to extract protein_id and transcript from a CDS tag, from genome in > genbak format but i have one problem, when the sequence in the file don't > have the protein_id or the transcript the script gives me this error: > > ------------- EXCEPTION ------------- > MSG: asking for tag value that does not exist protein_id > STACK Bio::SeqFeature::Generic::get_tag_values > /usr/share/perl5/Bio/SeqFeature/Generic.pm:504 > STACK toplevel parser_cds.pl:25 > -------------------------------------- > > Bellow I past the script > > ############################################## > use Bio::SeqIO; > use warnings; > > my $infile = $ARGV[0]; > my $outfile = "$infile.out"; > open (OUT, ">>$outfile"); > > my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > '-format' => 'Genbank'); > > while (my $inseq = $seq_in->next_seq) { > > for my $feat_object ($inseq->get_SeqFeatures){ > if ($feat_object->primary_tag eq "CDS"){ > print OUT $feat_object->get_tag_values('protein_id')," "; > print OUT $feat_object->get_tag_values('translation'),"\n"; > } > } > } > ############################################### > > Somebody can helps me? > > Thank > > Diogo Tschoeke > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cuiw at ncbi.nlm.nih.gov Tue Jan 15 11:50:53 2008 From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C]) Date: Tue, 15 Jan 2008 11:50:53 -0500 Subject: [Bioperl-l] Recommended way to download qual files from Genbank? References: <478797CE.9050202@purdue.edu><14CD2E51-81BF-44AB-A749-1379E6FA67E9@uiuc.edu><4787C479.8070600@purdue.edu> Message-ID: <18C407FD4FFB424292D769FBD68C1987048E95CC@NIHCESMLBX8.nih.gov> There is an alternative way if you can download and compile NCBI C++ Toolkit (ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/2007/Aug_27_2007/) . Simply call the binary like: id1_fetch -fmt quality -gi 13508865 Wenwu Cui ________________________________ From: Cook, Malcolm [mailto:MEC at stowers-institute.org] Sent: Fri 1/11/2008 2:40 PM To: Phillip San Miguel Cc: Chris Fields; bioperl-l Subject: Re: [Bioperl-l] Recommended way to download qual files from Genbank? Phillip: Of course - mea culpa - here's the full monty.... Indeed NCBI's eutils can do this: > ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual > AC207960.qual which uses my script (attached) to wrap NCBI's eutils. It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip by NCBI in their "Jul 24-27, 2007" course found at http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the very beginning so that trace messages are not printed on STDOUT, such as this echoed header: Retrieving 1 records from nucleotide... ... and footer: Received records 1 - 1. Wrote data to -. (otherwise they are interspersed with downloaded qual files) It also depends on recent version of GetOpt::Long. Hope it helps. Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Phillip San Miguel [mailto:pmiguel at purdue.edu] > Sent: Friday, January 11, 2008 1:33 PM > To: Cook, Malcolm > Cc: Chris Fields; bioperl-l > Subject: Re: [Bioperl-l] Recommended way to download qual > files from Genbank? > > Hi Malcolm, > Looks like your email was (inadvertantly?) redacted in > some way. (No attachment and last sentence truncated.) Would > it be possible to get a complete version so I can be sure I'm > following you? > Thanks, > Phillip > > Cook, Malcolm wrote: > > Indeed eutil is capable of this > > > > The following use of my ncbi_eutil (attached) script yeilds what you > > want: > > > > ncbi_eutil -search db=nucleotide term=AC207960 -fetch > rettype=qual > > > AC207960.qual > > > > It depends on the version of NCBI_PowerScripting.pm , such as is > > included in > > > > Malcolm Cook > > Database Applications Manager - Bioinformatics Stowers > Institute for > > Medical Research - Kansas City, Missouri > > > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org > >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris > >> Fields > >> Sent: Friday, January 11, 2008 11:10 AM > >> To: Phillip San Miguel > >> Cc: bioperl-l > >> Subject: Re: [Bioperl-l] Recommended way to download qual > files from > >> Genbank? > >> > >> I don't think this is possible with the current setup for > >> Bio::DB::GenBank (which the script uses). We'll have to > investigate > >> whether it is possible to retrieve this data via NCBI's > eutils; if so > >> we can try adding it in. If you want you can submit this as an > >> enhancement request via bugzilla for tracking: > >> > >> http://bugzilla.open-bio.org/ > >> > >> chris > >> > >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote: > >> > >> > >>> No problem getting sequence from genbank via a myriad of > methods. > >>> But as the volume of non-finished sequence in genbank > increases the > >>> importance of also obtaining quality values for a given sequence > >>> increases. Some records include quality values. > >>> > >>> I typically use bp_fetch.pl to grab a sequence from genbank: > >>> > >>> bp_fetch.pl -fmt fasta net::genbank:AC207960 > >>> > >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't > >>> designed to pull down quals evidently: > >>> > >>> bp_fetch.pl -fmt qual net::genbank:AC207960 > >>> > >>> gives: > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual > >>> object to write_seq() as a parameter named "source" > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/ > >>> 5.8.8/Bio/Root/Root.pm:359 > >>> STACK: Bio::SeqIO::qual::write_seq > >>> > >> /usr/local/perl_5.8/lib/site_perl/ > >> > >>> 5.8.8/Bio/SeqIO/qual.pm:205 > >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313 > >>> ----------------------------------------------------------- > >>> > >>> (running under bioperl 1.5.2) > >>> > >>> The quality values for this accession are in genbank as these URLs > >>> demonstrate: > >>> > >>> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746 > >> 0 > >> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 > >> 5 > >> > >>> 4937460&dopt=fasta > >>> > >>> > >>> > >> > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1 > >> 5 > >> > >>> 4937460&dopt=qual > >>> > >>> What is the best way to pull down these qual values? They aren't > >>> present in "GenBank(Full)" format. They are present in an ASN.1 > >>> format. > >>> > >>> Advice would be appreciated. > >>> > >>> -- > >>> Phillip > >>> Purdue Genomics Core Facility > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> Christopher Fields > >> Postdoctoral Researcher > >> Lab of Dr. Robert Switzer > >> Dept of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > > From singhal at berkeley.edu Tue Jan 15 17:50:12 2008 From: singhal at berkeley.edu (Sonal Singhal) Date: Tue, 15 Jan 2008 14:50:12 -0800 Subject: [Bioperl-l] redundant sequences Message-ID: Hi all, I am mining a few genomes to find all the genes in a gene family, and of course multiple BLAST searches of different paralogs are returning a lot of redundant hits. I have searched the BioPerl documentation, and I cannot find an easy way to cluster and then purge redundant sequences. Any ideas? Cheers, sonal From MEC at stowers-institute.org Tue Jan 15 18:21:00 2008 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 15 Jan 2008 17:21:00 -0600 Subject: [Bioperl-l] redundant sequences In-Reply-To: References: Message-ID: Cd-hit: http://bioinformatics.burnham.org/cd-hi/ Malcolm Cook Database Applications Manager - Bioinformatics Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Sonal Singhal > Sent: Tuesday, January 15, 2008 4:50 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] redundant sequences > > Hi all, > > I am mining a few genomes to find all the genes in a gene > family, and of course multiple BLAST searches of different > paralogs are returning > a lot of redundant hits. I have searched the BioPerl documentation, > and I cannot find an easy way to cluster and then purge > redundant sequences. Any ideas? > > Cheers, > sonal > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cain.cshl at gmail.com Tue Jan 15 21:24:50 2008 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 15 Jan 2008 21:24:50 -0500 Subject: [Bioperl-l] GenBank format and feature names > 15 char In-Reply-To: <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com> References: <1200336399.6056.12.camel@frissell> <6dce9a0b0801141053s6c1364esbc5fbf761ed3c8cd@mail.gmail.com> <6dce9a0b0801141146gfb4ee89o1523a360a280eeb1@mail.gmail.com> Message-ID: <1200450290.7276.3.camel@frissell> Hi Chris and Lincoln, I've attached my suggested patch. So, can I use svn to check it in? It only adds a space after the feature type name; I suspect that will be enough to fix the file format for most uses. Scott On Mon, 2008-01-14 at 14:46 -0500, Lincoln Stein wrote: > That's a new bug. The version I worked on inserted a space after the name. > > Lincoln > > On Jan 14, 2008 2:35 PM, Chris Fields wrote: > > > It looks like the keys in the feature table run into the location > > string w/o intervening space, which would probably cause havoc with > > roundtripping from this output. A few examples: > > > > BAC_cloned_genomic_insert<1..>1000 > > combined_genscanjoin(<1..347,400..498,794..>1000) > > splign_na_dbEST_ncbi<1..>1000 > > > > I would think at least a space in between the location and the key > > would be required for round-tripping out of genbank format. > > > > chris > > > > On Jan 14, 2008, at 12:53 PM, Lincoln Stein wrote: > > > > > Hi Scott, > > > > > > He is correct about the limitation, but we deliberately relaxed it > > > because > > > we were running into situations where we lost information during > > > roundtripping from other formats into genbank. > > > > > > Lincoln > > > > > > On Jan 14, 2008 1:46 PM, Scott Cain wrote: > > > > > >> Hi all, > > >> > > >> Last month, I got a bug report on the GBrowse bug tracker: > > >> > > >> > > >> > > http://sourceforge.net/tracker/index.php?func=detail&aid=1845217&group_id=27707&atid=391291 > > >> > > >> about a problem with dumping invalid GenBank files. GBrowse uses > > >> Bio::SeqIO::genbank to create these dumps. > > >> > > >> In his bug report, he claims that feature names over 15 characters > > >> long > > >> are invalid, and provided and example GenBank file where a feature is > > >> named 'BAC_cloned_genomic_insert', which is over 15 characters. > > >> What I > > >> want to know is this: is this truly a restriction on the GenBank > > >> format, > > >> or is it a software problem with some other package? Do we need to > > >> fix > > >> genbank.pm? I'm perfectly willing to do it; I'm just hesitant to > > >> believe this is really a bug. > > >> > > >> Thanks, > > >> Scott > > >> > > >> -- > > >> > > ------------------------------------------------------------------------ > > >> Scott Cain, Ph. D. > > cain.cshl at gmail.com > > >> GMOD Coordinator (http://www.gmod.org/) > > >> 216-392-3087 > > >> Cold Spring Harbor Laboratory > > >> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > > > > > > > > > > > -- > > > Lincoln D. Stein > > > Cold Spring Harbor Laboratory > > > 1 Bungtown Road > > > Cold Spring Harbor, NY 11724 > > > (516) 367-8380 (voice) > > > (516) 367-8389 (fax) > > > FOR URGENT MESSAGES & SCHEDULING, > > > PLEASE CONTACT MY ASSISTANT, > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl at gmail.com GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: genbank.pm.patch Type: text/x-patch Size: 1110 bytes Desc: not available URL: From cjfields at uiuc.edu Tue Jan 15 22:15:51 2008 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 15 Jan 2008 21:15:51 -0600 Subject: [Bioperl-l] Subversion migration complete Message-ID: On behalf of the BioPerl core developers, I am proud to announce that the BioPerl SVN migration has been completed. We would like to thank everyone who helped, in particular George Hartzell and Chris Dagdigian, both of who played instrumental roles in the CVS->SVN conversion and anonymous SVN setup for BioPerl. Anonymous SVN checkouts for bioperl-live are now possible using: svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live Developers can obtain a checkout from: svn co svn+ssh://USER at dev.open-bio.org/home/svn-repositories/bioperl/ bioperl-live/trunk bioperl-live Browsable repository: http://code.open-bio.org/svnweb/index.cgi/bioperl/ Basic instructions: http://www.bioperl.org/wiki/Using_Subversion We are still in the midst of implementing a few extra details related to SVN migration; the status on these can be viewed here: http://www.bioperl.org/wiki/CVS_to_SVN_Migration Enjoy! chris From bug-bioperl at rt.cpan.org Wed Jan 16 22:35:30 2008 From: bug-bioperl at rt.cpan.org (Chris Fields via RT) Date: Wed, 16 Jan 2008 22:35:30 -0500 Subject: [Bioperl-l] [rt.cpan.org #29533] Bio::SeqIO::interpro depends on XML::DOM::XPath In-Reply-To: References: Message-ID: Queue: bioperl Ticket On Fri Sep 21 10:28:52 2007, support at helpdesk.open-bio.org wrote: > Hi Mike, > > The proper place to submit this fix is the bioperl-l at lists.open-bio.org > mailing list or the OBF Bugzilla queue at: > http://bugzilla.open-bio.org/, this RT system is mainly for sysadmin > activities rather than for tracking code changes. Would you be so kind > to re-send your request to one of the places above? Thanks for the heads > up! :) > > Regards, > Mauricio. This has been fixed. I'll get the CPAN maintainer to close this out. From vipingjo at gmail.com Thu Jan 17 03:48:36 2008 From: vipingjo at gmail.com (viping) Date: Thu, 17 Jan 2008 16:48:36 +0800 Subject: [Bioperl-l] Can't locate object method "is_compatible" via package "Bio::Tree::Tree" Message-ID: <200801171648332965577@gmail.com> Hi Everyone?? I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + Windows XP SP2. When running example codes(attched below as t.pl) within Bio\Tree\Compatible.pm , I got this error: Can't locate object method "is_compatible" via package "Bio::Tree::Tree" I replaced "$t1->is_compatible($t2)" with "is_compatible Bio::Tree::Compatible ($t1,$t2)", the error changed: Can't locate object method "get_nodes" via package "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252, line 1. I modified Compatible.pm, changed code for "get_nodes" like this "get_nodes Bio::Tree::Tree($self);", new error arised : Can't use string ("Bio::Tree::Tree") as a HASH ref while "strict refs" in use at i:/Perl/site/lib/Bio\Tree\Tree.pm line 198, line 1. I gived up. Any help will be deeply appreciated. # this is the example script in Bio::Tree::Compatible??t.pl use Bio::Tree::Compatible; use Bio::TreeIO; my $input = new Bio::TreeIO('-format' => 'newick', '-file' => 'input.tre'); my $t1 = $input->next_tree; my $t2 = $input->next_tree; my ($incompat, $ilabels, $inodes) = $t1->is_compatible($t2); if ($incompat) { my %cluster1 = %{ $t1->cluster_representation }; my %cluster2 = %{ $t2->cluster_representation }; print "incompatible trees\n"; if (scalar(@$ilabels)) { foreach my $label (@$ilabels) { my $node1 = $t1->find_node(-id => $label); my $node2 = $t2->find_node(-id => $label); my @c1 = sort @{ $cluster1{$node1} }; my @c2 = sort @{ $cluster2{$node2} }; print "label $label"; print " cluster"; map { print " ",$_ } @c1; print " cluster"; map { print " ",$_ } @c2; print "\n"; } } if (scalar(@$inodes)) { while (@$inodes) { my $node1 = shift @$inodes; my $node2 = shift @$inodes; my @c1 = sort @{ $cluster1{$node1} }; my @c2 = sort @{ $cluster2{$node2} }; print "cluster"; map { print " ",$_ } @c1; print " properly intersects cluster"; map { print " ",$_ } @c2; print "\n"; } } } else { print "compatible trees\n"; } __END__; # this is the file 'input.tre': (((A,B)C,D),(E,F,G)); ((A,B)H,E,(J,(K)G)I); # this is the full messages I got running like this: "perl.exe -w t.pl" Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96. Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145. Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162. Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196. Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211. Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235. Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257. Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278. Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314. Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100. Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152. Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190. Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252. Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300. Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334. Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375. Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399. Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420. Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449. Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491. Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505. Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526. Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552. Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577. Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597. Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617. Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637. Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653. Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669. Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685. Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690. Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717. Can't locate object method "is_compatible" via package "Bio::Tree::Tree" at Z:\bp\t.pl line 8, line 2. From bix at sendu.me.uk Thu Jan 17 06:18:56 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 17 Jan 2008 11:18:56 +0000 Subject: [Bioperl-l] Can't locate object method "is_compatible" via package "Bio::Tree::Tree" In-Reply-To: <200801171648332965577@gmail.com> References: <200801171648332965577@gmail.com> Message-ID: <478F39A0.2030508@sendu.me.uk> viping wrote: > Hi Everyone?? > > I'm using ActivePerl-5.8.8.822-MSWin32-x86-280952 + bioperl-1.5.2 + > Windows XP SP2. When running example codes(attched below as t.pl) > within Bio\Tree\Compatible.pm , I got this error: > > Can't locate object method "is_compatible" via package > "Bio::Tree::Tree" > > I replaced "$t1->is_compatible($t2)" with "is_compatible > Bio::Tree::Compatible ($t1,$t2)", Yup, you had the right idea; unfortunately the synopsis code for Bio::Tree::Compatible is wrong. I've now fixed it in svn. > the error changed: Can't locate object method "get_nodes" via package > "Bio::Tree::Compatible" at i:/Perl/site/lib/Bio/Tree/Compatible.pm > line 252, line 1. I didn't get quite that error; instead I had an issue with TreeIO: for whatever reason it is only returning one tree from your input file (ie. $t2 is undefined). I therefore got "Can't call method "get_nodes" on an undefined value [...]" Can someone look into/confirm that? From bix at sendu.me.uk Thu Jan 17 06:35:57 2008 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 17 Jan 2008 11:35:57 +0000 Subject: [Bioperl-l] Can't locate object method "is_compatible" via package "Bio::Tree::Tree" In-Reply-To: <478F39A0.2030508@sendu.me.uk> References: <200801171648332965577@gmail.com> <478F39A0.2030508@sendu.me.uk> Message-ID: <478F3D9D.6050306@sendu.me.uk> Sendu Bala wrote: >> the error changed: Can't locate object method "get_nodes" via >> package "Bio::Tree::Compatible" at >> i:/Perl/site/lib/Bio/Tree/Compatible.pm line 252, line 1. > > I didn't get quite that error; instead I had an issue with TreeIO: > for whatever reason it is only returning one tree from your input > file (ie. $t2 is undefined). > > I therefore got "Can't call method "get_nodes" on an undefined value > [...]" > > Can someone look into/confirm that? ... Yeah, I think I'm losing my mind. The code below is 'ok' using the commented out -fh input for TreeIO, but is 'not ok' using the -file input, where the specified file contains the exact same data as __DATA__. Huh? #!/usr/bin/perl -w use strict; use warnings; use Bio::Tree::Compatible; use Bio::TreeIO; my $input = new Bio::TreeIO('-format' => 'newick', #-fh => \*DATA, -file => 'input.tre' ); my $t1 = $input->next_tree; my $t2 = $input->next_tree; if ($t2) { print "ok\n"; } else { print "not ok\n"; } __DATA__ (((A,B)C,D),(E,F,G)); ((A,B)H,E,(J,(K)G)I); From vipingjo at gmail.com Thu Jan 17 08:23:14 2008 From: vipingjo at gmail.com (viping) Date: Thu, 17 Jan 2008 21:23:14 +0800 Subject: [Bioperl-l] Can't locate object method "is_compatible" via package"Bio::Tree::Tree" References: <200801171648332965577@gmail.com>, <478F39A0.2030508@sendu.me.uk> Message-ID: <200801172123112184046@gmail.com> I got latest code modified by Sendu Bala vi SVN. It works well while "input.tre" and "t.pl" are in the same directory. Thank you, Sendu Bala. This is output: Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 96. Subroutine nodelete redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 145. Subroutine get_nodes redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 162. Subroutine get_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 196. Subroutine set_root_node redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 211. Subroutine total_branch_length redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 235. Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 257. Subroutine score redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 278. Subroutine cleanup_tree redefined at i:/Perl/site/lib/Bio\Tree\Tree.pm line 314. Subroutine new redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 100. Subroutine add_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 152. Subroutine each_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 190. Subroutine remove_Descendent redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 252. Subroutine remove_all_Descendents redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 300. Subroutine ancestor redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 334. Subroutine branch_length redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 375. Subroutine bootstrap redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 399. Subroutine description redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 420. Subroutine id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 449. Subroutine internal_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 491. Subroutine _creation_id redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 505. Subroutine is_Leaf redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 526. Subroutine height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 552. Subroutine invalidate_height redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 577. Subroutine add_tag_value redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 597. Subroutine remove_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 617. Subroutine remove_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 637. Subroutine get_all_tags redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 653. Subroutine get_tag_values redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 669. Subroutine has_tag redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 685. Subroutine node_cleanup redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 690. Subroutine reverse_edge redefined at i:/Perl/site/lib/Bio\Tree\Node.pm line 717. incompatible trees label G cluster G cluster G K cluster A B C properly intersects cluster A B H cluster A B C properly intersects cluster A B E G H I J K cluster A B C D properly intersects cluster A B H cluster A B C D properly intersects cluster A B E G H I J K cluster E F G properly intersects cluster G K cluster E F G properly intersects cluster G I J K cluster E F G properly intersects cluster A B E G H I J K cluster A B C D E F G properly intersects cluster A B H cluster A B C D E F G properly intersects cluster G K cluster A B C D E F G properly intersects cluster G I J K cluster A B C D E F G properly intersects cluster A B E G H I J K #this is latest code: use Bio::Tree::Compatible; use Bio::TreeIO; my $input = Bio::TreeIO->new('-format' => 'newick', '-file' => 'input.tre'); my $t1 = $input->next_tree; my $t2 = $input->next_tree; my ($incompat, $ilabels, $inodes) = Bio::Tree::Compatible::is_compatible($t1,$t2); if ($incompat) { my %cluster1 = %{ Bio::Tree::Compatible::cluster_representation($t1) }; my %cluster2 = %{ Bio::Tree::Compatible::cluster_representation($t2) }; print "incompatible trees\n"; if (scalar(@$ilabels)) { foreach my $label (@$ilabels) { my $node1 = $t1->find_node(-id => $label); my $node2 = $t2->find_node(-id => $label); my @c1 = sort @{ $cluster1{$node1} }; my @c2 = sort @{ $cluster2{$node2} }; print "label $label"; print " cluster"; map { print " ",$_ } @c1; print " cluster"; map { print " ",$_ } @c2; print "\n"; } } if (scalar(@$inodes)) { while (@$inodes) {